Drive failure and IO errors? ZFS to the rescue!

Published on 2022-09-29.

Hard drives fail in all kinds of strange ways, when they do, you really appreciate all the hard work the programmers and engineers has put into ZFS because it is truly amazing!

Unless you're dealing with a physical failure of a spinning hard drive that reveals itself in the sound of strange clicking noises, nothing will alert you to the fact that your drive is experiencing read or write errors unless you sit with your face glued to the output of dmesg, and who does that anyway?

The funny thing is that you can move massive amounts of data with cp, rsync, Midnight Commander, or some other marvelous tool, without being aware of a disaster running rampant just below the surface of your otherwise peaceful existence.

Unless you have a background process of SMART running in the background, you will not notice data corruption until it is too late. Well, unless you're running with ZFS of course :)

I have previously written about ZFS in my write-up Battle testing ZFS, Btrfs and mdadm+dm-integrity, in which I compared ZFS to Btrfs and mdadm+dm-integrity, but what I maybe failed to do was to describe the joy that ZFS can bring during an otherwise stressful moment.

It's funny, because it's actually difficult to put into words. I have dealt with different filesystems over the past 35+ years, but one thing is clear, ZFS is unique! Absolutely nothing compares. And it's not just the fact that ZFS tries really really hard to protect your data, actually amazingly hard, but it almost also "holds your hand" while it does it.

ZFS is not only intuitively easy to use, very well documented, but it also comes with the most pleasant user interface and some really nice features.

When you work with ZFS on a daily basis you can almost feel how spoiled you are because ZFS takes really good care of you. In any other filesystem you will have to struggle! You will struggle with documentation, you will struggle with the structure of commands, and sometimes you will even experience gibberish output messages. Not in ZFS.

ZFS will notify you the moment it experiences any kind of errors, and it will notify you in a way that you understand what is happening right away.

# zpool status -v
  pool: pool1
 state: DEGRADED
 status: One or more devices are faulted in response to IO failures.
 ...

Let's take a look in the system log:

Sep 29 11:13:12 foo kernel: ata6.00: status: { DRDY }
Sep 29 11:13:12 foo kernel: ata6.00: cmd 60/00:28:60:a3:fb/08:00:1f:00:00/40 tag 5 ncq dma 1048576 in
                                       res 40/00:20:60:9b:fb/00:00:1f:00:00/40 Emask 0x10 (ATA bus error)
Sep 29 11:13:12 foo kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep 29 11:13:12 foo kernel: ata6.00: status: { DRDY }
Sep 29 11:13:12 foo kernel: ata6.00: cmd 60/00:20:60:9b:fb/08:00:1f:00:00/40 tag 4 ncq dma 1048576 in
                                       res 40/00:20:60:9b:fb/00:00:1f:00:00/40 Emask 0x10 (ATA bus error)
Sep 29 11:13:12 foo kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep 29 11:13:12 foo kernel: ata6.00: status: { DRDY }
Sep 29 11:13:12 foo kernel: ata6.00: cmd 60/00:18:60:93:fb/08:00:1f:00:00/40 tag 3 ncq dma 1048576 in
                                       res 40/00:20:60:9b:fb/00:00:1f:00:00/40 Emask 0x10 (ATA bus error)
Sep 29 11:13:12 foo kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep 29 11:13:12 foo kernel: ata6: SError: { UnrecovData 10B8B BadCRC }
Sep 29 11:13:12 foo kernel: ata6.00: irq_stat 0x08000000, interface fatal error
Sep 29 11:13:12 foo kernel: ata6.00: exception Emask 0x10 SAct 0x38 SErr 0x280100 action 0x6 frozen
Sep 29 11:13:11 foo kernel: ata6: EH complete
Sep 29 11:13:11 foo kernel: zio pool=pool1 vdev=/dev/disk/by-id/ata-ST3500320NS_9QM5WDWG-part1 error=5 type=1 offset=274707300352 size=1048576 flags=40080cb0
Sep 29 11:13:11 foo kernel: I/O error, dev sdd, sector 536539744 op 0x0:(READ) flags 0x700 phys_seg 16 prio class 0
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#18 CDB: Read(10) 28 00 1f fa f2 60 00 08 00 00
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#18 Add. Sense: Unaligned write command
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#18 Sense Key : Illegal Request [current]
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Sep 29 11:13:11 foo kernel: zio pool=pool1 vdev=/dev/disk/by-id/ata-ST3500320NS_9QM5WDWG-part1 error=5 type=1 offset=274706251776 size=1048576 flags=40080cb0
Sep 29 11:13:11 foo kernel: I/O error, dev sdd, sector 536537696 op 0x0:(READ) flags 0x700 phys_seg 16 prio class 0
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#17 CDB: Read(10) 28 00 1f fa ea 60 00 08 00 00
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#17 Add. Sense: Unaligned write command
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#17 Sense Key : Illegal Request [current]
Sep 29 11:13:11 foo kernel: sd 5:0:0:0: [sdd] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Sep 29 11:13:11 foo kernel: zio pool=pool1 vdev=/dev/disk/by-id/ata-ST3500320NS_9QM5WDWG-part1 error=5 type=1 offset=274705203200 size=1048576 flags=40080cb0
Sep 29 11:13:11 foo kernel: I/O error, dev sdd, sector 536535648 op 0x0:(READ) flags 0x700 phys_seg 16 prio class 0

Of course none of this is visible via the tool you're running to transfer the files, and you can even manage to just get the bad sectors "marked up" and then continue using the drive without knowing that you're dealing with a possible cascading error. But with ZFS you not only quickly get into the habit of always checking the status of your pools, which is very important, but you will also experience a degraded state of the pool in case something is not working right. Of course you should always setup SMART to monitor your disks, but with ZFS you'll get a very early warning as long as you utilize the zpool status -v command on a regular basis.

Because of the way ZFS is designed with error correction and the usage of parity data, it will do the best it can to protect your data and enable you to recreate the corrupt data. Yet, ZFS can only do so much, and if you're dealing with multiple failing drives, so many that you're loosing even parity data, you still need to know what NOT to do in such a situation. The most important thing NOT to do, is to continue using the failed disks! If you can manage, clone the disks manually sector by sector to healthy disks, ignoring the read errors, and then use the healthy disks to recreate as much of the lost data as possible from those. That way you'll avoid taxing an already broken drive more than needed.

Now, I am not saying that you should run ZFS on every system you own, but you will do yourself a major favor by running it on systems with important data. And no, Btrfs cannot compare, not by a long shot.

I personally prefer to always have a ZFS mirror that I can mount from anywhere even if I run with something else on my root drive. It is not only extremely convenient because of the snapshot feature of ZFS, but also because of ZFS's send/receive functionality, in which you can easily and incrementally send data sets from one ZFS pool to another, even encrypted data sets.

Anyway, take a look at the video chat with Wendell from Level1 Techs and Allan Jude from Klare Inc. in which they chat about ZFS, drive failures and restoring data from corrupted drives.

Oh, and remember! ZFS is NOT a replacement for a good backup strategy. I personally recommend something like the 3-2-1 rule - and then some!