Having a solid backup strategy and why you should not rely on fancy tools for backup

Published on 2021-07-27. Modified on 2023-10-24.

Have you ever lost important data? I have. I learned doing backups the hard way after I lost a entire book I had just finished writing!

Table of contents

The year was 2000-something when I had just finished writing a book I had been working on for a couple of years. Somehow I managed to delete all the work I had done.

I don't think I have ever felt as I did in that very moment when I realized what just happened. It was like the world stopped moving. "Nooooooooo!", I shouted, while I desperately jumped around in utter disbelief. "This cannot be happening! This cannot be happening! What did I just do!?"

Well, it did happen, I did manage to delete everything, including the poor excuse of a backup I had, and I had to re-write the entire book from scratch!

Over the years I have tried different approaches and used different tools, but for about the last ten years nothing has changed much and the method I implement has become pretty solid.

Why having a solid backup strategy really matters

Doing real backup involves a lot of serious consideration, but the most important thing is to have a solid strategy.

This basically means that you have created a workflow in which regular backup of important data is an integral part of that workflow. It matters because once you have the strategy in place it becomes second nature and you seldom have to think about it.

Even though storage space is relatively cheap nowadays I don't believe in backing up everything. It really is only the important data that require backup. Data which - if you lose it - would affect you negatively in some way. A friend of mine keeps all his data around, even the non-important data, but I personally prefer to "clean house" once and a while.

The 3-2-1 rule

The 3-2-1 rule of backup is a good minimum part of a solid strategy.

When you keep your backup copies of data both locally and offsite you increase the protection in the event of any unforeseen event or disaster.

Use a timetable

How often you backup is based upon your assessment of how much data you are willing to lose.

You can do backups manually or you can integrate some kind of automation. I do both.

Test your strategy

Once you have a backup strategy planned out, you need to test it. If you don't test it you will never know if it is really going to work out.

Simulate everything from a simple hard drive failure, to a bug in the encryption algorithm, to something like a natural disaster.

Don't skip this part!

I have seen people implement all kinds of backup solutions only to still manage to lose all data. They had not tested out the solution and they were not prepared. Once they had to deal with a recovery problem they messed that up too.

Always test your strategy and prepare for recovery.

Step 1 - I keep all important data on a ZFS network share

I have a bunch of different computers I use for different tasks and they run different operating systems. I have tried various options to keep data synchronized between machines, such as utilizing version control, using rsync, Unison, duplicity and other tools, but I eventually settled on something really simple. I run a ZFS network attached storage server. The storage server is accessible through both Samba, NFS and SSH.

A ZFS mirror is running on this storage server. One of the purposes of this server is to serve as a storage media that is shared between multiple computers.

Step 2 - I have a ZFS backup server

I do regular backups to a ZFS backup server running in a 3 way mirror.

The backup server is not running 24/7, but is only turned on when needed.

I use rsync and ZFS send and receive to backup files.

Step 3 - I do regular backups to external media

At regular intervals I do a full backup of all the data located on the backup server unto external media for storage offsite.

I have a couple of external drives which also run in a ZFS mirror. These drives gets attached, mounted and then an incremental backup is done using rsync and/or ZFS send and receive.

When the backup is done the drives are transported for physical storage at an external location.

The next time I need to do a full backup I use a second set of drives and perform the same routine. The old set of drives then get swapped out with the first set. This is to avoid having a single moment in time where there isn't an external offsite backup in existence.

You have to make it easy to make it work

In order to make everything work you have to make it as easy as possible, otherwise you won't do it regularly.

If your storage server is located in the attic and you have to physically walk up a ton of stairs to use it, you will never use it :)

Try to setup your equipment in a way that makes it easy to use. Keep your basic storage media for backup close by. The only exception to this rule is the storage media which you keep at an external location. It's important that it isn't too close to you, like at your neighbors house, but it also shouldn't require a 4 hours drive each time you want to make a backup :)

Don't trust cloud providers blindly

While I know of some really good cloud providers, such as rsync.net and Tarsnap, I recommend that you never trust cloud providers blindly.

Everything can look really nice "on paper" but you don't know what goes on behind the scenes. I have worked with a lot of different people and I have seen too much crazy shit to fully trust anyone with my important data. A cloud provider may have the best of intentions, but sometimes all it takes is a single grumpy employee or even a minor mistake to do a lot of damage.

That doesn't mean you shouldn't take advantage of some of the different solutions and the more you copy your data around the better it is. Just don't trust a cloud provider without taking additional steps to keep your data safe.

Don't rely on "fancy" tools for backup

There exist some really cool open source backup solutions such as Borg and Restic, but you should never rely solely on these "complex" solutions. These tools work really great, until they don't! In the past I have lost data to duplicity and other tools.

Once a complex tool breaks down and you suddenly have to work with some obscure binary file format or something else that nobody understands, you begin to cherish the more simple solutions.

One could argue that ZFS is complex as well, but that is on the filesystem level, a level on which you cannot avoid complexity no matter what you do.

I am not saying you shouldn't use something like Borg or Restic, but when it comes to backup and file management nothing beats simple and solid tools such as rsync, Tar and GnuPG for encryption.

When you're dealing with recovery of important data the last thing you want to worry about is layers of added complexity. The more complex a tool gets, the more difficult it becomes to handle data corruption or application bugs.

Only use encryption when it is really needed

While you might consider doing a full encryption for both your personal laptop and/or desktop, in case one of these gets stolen, you should avoid encryption on backup and storage when it really isn't needed because encryption adds yet another layer of complexity.

Not only does encryption during data recovery make everything much more difficult, but should you pass away, your family members might not have the skills required to access the data.

When you do utilize encryption, consider the difference between encrypting each file individually and then putting a lot of files into a single archive which you then encrypt. When everything is stored in a single encrypted archive, you risk losing more data should something go wrong with the encryption. Always validate your data using something like a checksum or other similar utilities.

Final notes

If you have never implemented a backup strategy and you doubt whether you actually need one, consider how you would feel if you lost all your data. If you wouldn't mind losing your data, you properly don't need backup. However, most people nowadays have at least pictures and videos of friends and family they care about. Perhaps you have important documents laying around somewhere too. In either case, it's much better to have a backup and not need it, than to need it and not have it.

When you store files on ZFS, ZFS not only ensures data integrity by protecting you against silent data corruption caused by data degradation, but it also helps you with power surges (voltage spikes), bugs in disk firmware, phantom writes (the previous write did not make it to disk), misdirected reads/writes (the disk accesses the wrong block), DMA parity errors between the array and server memory or from the driver (since the checksum validates data inside the array), driver errors (data winds up in the wrong buffer inside the kernel), accidental overwrites (such as swapping to a live file system), and much more.

You don't have to rely on expensive hardware to run a ZFS storage or backup server. I have managed perfectly fine for a very long period of time to run a mirror on two external USB disks on a Marvell Armada 510 - 800 Mhz Cubox. It is generally not a good idea to run ZFS on external USB drives, however neither ZFS nor the little Cubox have ever failed me. I actually only switched it out for something else in order to get better performance.

NOTE: ZFS without ECC memory is no worse than any other file system without ECC memory. Using ECC memory is recommended in situations where the strongest data integrity guarantees are required. Random bit flips caused by cosmic rays or by faulty memory can go undetected without ECC memory. Any filesystem will write the damaged data from memory to disk and be unable to automatically detect the corruption. Also note that ECC memory is often not supported by consumer grade hardware. And ECC memory is also more expensive. In any way you can run ZFS without using ECC memory, it's not a requirement. Just make sure to validate your data.

FreeBSD has great support for a lot of hardware and it has a solid ZFS implementation. If you have an old computer laying around, perhaps you can use that either as a storage server or as a backup server.

In any case, unless you truly don't care about your data, you should adopt a solid backup strategy before it's too late ;)