3 Years of ZFS

2020-03-10

3 years and change ago, I decided to setup a FreeNAS box with a ZFS mirror to store long term data. The OS changed to Debian when I decided to run a few web apps, but I kept using ZFS for the mirror. The main point of ZFS is that it checks the data periodically and can repair as needed. After over 3 years of running it found its first data error with a few checksums being wrong. This wasn't surprising since the drive had uncorrectable sectors for the last year. At the same time as the ZFS checksum errors, the first reallocated sectors showed up. I went ahead and ordered a new drive since Backblaze and Google correlate reallocated sectors to a failing drive.

I'm a big fan of keeping it simple so I reevaluated whether ZFS is the right choice vs the standard mdadm+lvm+ext4 setup. My use case is a mirror(RAID 1) that's used as a media server and backup of my desktop ssd. ZFS's checksum protection did nothing for me since SMART data already told me the drive should be monitored closely and replaced when reallocated sectors happened or uncorrectlable sectors became worrying. ZFS's snapshots were turned on and I used them once to recover a file I had accidently deleted. This feature was unsettling to use since it rollsback the entire filesystem to a previous date in time. You have to be sure that you get back to a state where the data you want is correct and the rest of the filesystem is acceptable. This gets a bit sketchy when you have thousands of files on a 4TB drive. It's also not a free feature as many people portray it. Each snapshot withholds changed blocks as long as the snapshot exists. This meant I had a few hundred GBs being used on top of the current data. Compression is currently sitting at 0% since the majority of files aren't compressible. A minor pain point on Linux is that every ZFS update means recompiling the ZFS modules so updates take a good deal longer and will need a couple cores dedicated to that.

In the end, I'm not convinced a user without some serious hardware should use ZFS. It's got shiny features, but they don't seem to come into play unless you have a targeted use case or by some very, extremely, ridiculously unlikely event, manage to get an error that isn't caught by ECC, SMART etc, but is caught by ZFS. You might say those features seem like they could be nice, why not use ZFS in case I want them later? ZFS lags behind common file systems in one critical area, performance. In the end, you'd be giving up daily performance for that possibility. If I were to do it again, I would go with mdadm+lvm+ext4.