Linux storage & filesystems talk - Or why should I use/avoid this or that for each use case.

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.

Kiislova

Just a furfag
True & Honest Fan
kiwifarms.net
Joined
Sep 23, 2018
So recently the ooperator upgraded the storage and referred to it using XFS in RAID and I was wondering what are the ups and down of using the different filesystems and volume management tools onLinux?

In no particular order:
  • Pros/Cons of XFS, EXT4, Btrfs, ZFS
  • Common filesystem use cases
    • Growing and shrinking
    • Snapshots
    • Recovery after a disk failure
    • ...
  • RAID vs LVM, and how it plays with all of the above
  • Horror stories about losing all your data
I am considering building a bulky storage server and I am taking a look at this side of Linux and I want more information and first hand experiences. So far the impression I'm getting is that, if I can live with ZFS being an out of tree filesystem, it is a pretty stable all-in-one solution.
 
XFS RAID in The Year of Our Lord 2025 is... a bit of an odd take. Same for EXT4. I'm not really sure what the reason would be for not using either BTRFS or ZFS, both of which have native support for RAID, inline checksumming, and snapshots.

My biggest complaint on BTRFS for data storage would be that they officially still don't consider RAID 5/6 ready. 10 works, but RAID 10 is cringe. It's a perfectly serviceable FS for non-RAID though.

ZFS is generally regarded as higher quality, but has had historically high RAM consumption (I've heard this has gotten better but I haven't had a chance to test it), and is harder to use for boot drives on Linux. I've put ZFS through a lot, and I don't think I've ever had a had a failure that was because of the filesystem itself. I've done in place replacement of drives with multiple rounds of remirroring and it worked fine. Pulling individual files out of snapshots is hilariously easy. I've had inline checksumming catch and repair errors a few times.

ZFS does have some limitations on adding drives to an existing RAID5/6. This feature was added VERY recently and I can't vouch for it. Otherwise, ZFS is great. Just don't use it on SMR drives, only CMR. That's good advice in general though. Same with the use of ECC memory. You don't HAVE to, but it's recommended.

I've been working on a new NAS, and the plan is mirrored BTRFS for the boot drives, ZFS for the data RAID.

Damn shame the bcachefs guy turned out to be a sped, because I was actually rather looking forward to that.
 
My biggest complaint on BTRFS for data storage would be that they officially still don't consider RAID 5/6 ready
Meaning that they don't do the RAID part themselves (like ZFS) and for now rely on LVM doing it for them?
I currently have some large-ish ZFS pool but I'm trying to look ahead and while I will probably stay on ZFS, I want to know if for a friend it would be more convenient.

It's a perfectly serviceable FS for non-RAID though
I use Btrfs for my boot/rootfs drive, something normal sothat a ZFS driver fuckup won't make the machine unbootable.

ZFS does have some limitations on adding drives to an existing RAID5/6. This feature was added VERY recently and I can't vouch for it.
Adding new raid5/6 vdevs to an existing pool works just fine. Adding new drives to the vdev I don't know if it is possible at all.

ZFS is great. Just don't use it on SMR drives, only CMR. That's good advice in general though. Same with the use of ECC memory. You don't HAVE to, but it's recommended.
Any treason why besides SMR being painfully slow during recovers?

Damn shame the bcachefs guy turned out to be a sped, because I was actually rather looking forward to that
Looks promising despite the sped-ness but I wouldn't use it in production ever, or at least not this decade.
 
Meaning that they don't do the RAID part themselves (like ZFS) and for now rely on LVM doing it for them?
I currently have some large-ish ZFS pool but I'm trying to look ahead and while I will probably stay on ZFS, I want to know if for a friend it would be more convenient.
No, BTRFS has native RAID. But again the RAID5/6 is not considered reliable, from the developers themselves.

Adding new raid5/6 vdevs to an existing pool works just fine. Adding new drives to the vdev I don't know if it is possible at all.
Right, but adding a vdev to a pool doesn't add it to the stripe. It doesn't actually make it part of the RAID. You're just duct taping multiple arrays together, which can have highly unpredictable effects on reliability/performance. There's a VERY recent feature (past few months) that allows you to add a physical disk to a RAID vdev.


Any treason why besides SMR being painfully slow during recovers?
SMR is absolutely non-functional with ZFS RAID. Back when WD got sued for false advertising on the Reds, I had ordered a larger set of SMRs as an upgrade, not realizing that WD was straight up lying about what they were selling. The resilver threw CONSTANT SMART errors and simply did not work. Took those fuckers an actual year to warranty them for CMRs.
 
I just use BTRFS for all the drives I use on Linux. While it's native RAID does have write holes and isn't suitable for anything past mirroring and striping, using it as the underlying filesystem for an MDADM RAID is just fine from what I've heard. And is apparently what Synology does on all it's devices.
 
Last edited:
Same for EXT4
The big distros still, I believe, set it as the default when you're going through the install.

ZFS does have some limitations on adding drives to an existing RAID5/6. This feature was added VERY recently and I can't vouch for it. Otherwise, ZFS is great. Just don't use it on SMR drives, only CMR. That's good advice in general though. Same with the use of ECC memory. You don't HAVE to, but it's recommended.
Does ZFS still have the limitation where you can't resize a pool? IIRC (from years ago) people ran into this when they had a ZFS partition on a drive, and wanted to increase or decrease its size. I believe the recommendation then was to just give ZFS the whole drive, which was probably the sensible thing to do anyway.
 
I just use BTRFS for all the drives I use on Linux. While it's native RAID does have write holes and isn't suitable for anything past mirroring and striping, using it as the underlying filesystem for an MDADM RAID is just fine from what I've heard. And is apparently what Synology does on all it's devices.
It can work, but if you're doing a RAID5/6, I'd generally just find ZFS simpler at that point.

Does ZFS still have the limitation where you can't resize a pool? IIRC (from years ago) people ran into this when they had a ZFS partition on a drive, and wanted to increase or decrease its size. I believe the recommendation then was to just give ZFS the whole drive, which was probably the sensible thing to do anyway.
Not that I'm aware of. I've done the trick where you replace the RAID drives in sequence, then expand the pool at the end. Worked fine. There might be certain circumstances where it's limited, and as a general rule, shrinking file systems is almost always harder than expanding them.

I've also done a little research on XFS. I can see why Nool chose it for KF, because it seems to do exceptionally well with extremely large numbers of small, parallel writes. However I would not recommend it for general usage as it lacks a lot of modern features. And honestly I'm not sure it would be better in practice than ZFS even here, but I'm going to give him the benefit of the doubt on this one.
 
  • Like
Reactions: Geranium
BTRFS on an LVM worked well for me. In particular I enjoy the tools BTRFS offers as part of btrfs-progs, such as btrfs-image and btrfs-convert (though btrfs-convert would be a lot more useful if it had support for more filesystems). The LVM is also useful for things like resizing the paritions and LUKS encryption. For the longest time I've been very conservative with all the filesystems and considered anything other than EXT4 unnecessary, but since I started using LVM+BTRFS I feel like the trying out was worth it. There never was any friction with those two options combined. As for ZFS, I've only tried it when I was using FreeBSD, it didn't get in my way and that was all I needed, it seemed fine, however, I wouldn't use it as long as it isn't a part of the kernel. I have this fear that it not being a FS embededded in the kernle, may cause instabilities; I would be interested in knowing whether that can be the case.
 
  • Like
Reactions: Combustion Engine
Drives are so big these days that basically everything I own fits on one single 4TB drive. For that reason, I don't need RAID's drive-spanning capabilities. What I do need though is redundancy. So, I have two 4TBs and use software (mdadm) to mirror them (RAID1)

I've had drives fail in the past. I actually have a google doc to keep track of this. My last failure was a 1TB WD blue WD10EALX that was purchased on 5/20/2011 for $60 and failed on 4/28/2019. The setup I use made recovery super, super easy. I've also had motherboard failures and it's the same deal - recovery is easy, just pop one of the drives in a new machine.

Long, long ago I've used other RAID configurations, and found them to be more trouble than my needs require. I've also used hardware RAID and had a failure that was unrecoverable. I lost everything since the last snapshot. Never again!

I'm still on ext4 but agree with others who have said that ZFS is the way forward. I bought my 4TBs in 2023, and use the previous generation (two 2TBs) for periodic snapshots, stored offsite.

So yeah, my setup is pretty simple but it works for me.
 
To save you the time: BTRFS is the superior linux filesystem and the correct choice in basically every application.

In kernel:
Unique ZFS downside, it cannot be in the kernel. So you have the additonal headache of building the dkms and needing to keep those versions in sync with your kernel version. Don't screw up or else your system won't boot :).

Checksumming: XFS, EXT4 don't have this. If you care, even a little bit, about your data or you want to use RAID at all you need checksumming. It will also help with detecting hardware/memory issues.

RAID: This is complicated but every filesystem except btrfs has unacceptable problems and btrfs raid5/6 problems are VASTLY overblown. So much so that other solutions are actually worse than the current state of raid (in general, not just 5/6) in btrfs.

-ZFS: Has checksumming but each vdev needs to be composed of the same sized disks. If you use larger disks as replacements only when all of them have been replaced can you increase the size of the vdev. Additionally I don't think they can shrink the filesystem yet (that might've changed recently iirc). Between this and being not in the kerenl just a headache.

-XFS / EXT4+MDADM: "lol, lmao". Because these don't have checksumming then, for example, in raid 1 if one of the mirrors dies in any other way besides letting out the magic smoke instantly, you are fucked. If one mirror starts returning bad data there's no way to tell which mirror is good. You also won't get any read errors because they will just return data from whichever mirror it happens to read from. You could checksum the same file twice and get different results because it read from different mirrors and the system won't even notice an issue because it doesn't even compare mirrors on read. Additionally XFS still cannot shrink volumes. And no, losing 80% of your performance with dm-integrity is not a solution.

P.S. Have backups? Have fun finding out you have a flaky disk and all your back ups are now maybe corrupted because it might've read from the flaky disk! I can only hope null has a robust backup solution for the farms given he is using XFS.

-BTRFS: Much ado about nothing has been made over the "write hole" in btrfs raid 5/6. That is, if you experience a power loss while writing data to a raid 5/6 array and then it happens again before you run a scrub (which will correct the issue) then that extent might get corrupted. Compare this to EXT4 or XFS where if you have a power loss while in the middle of writing some data... it's just immediately corrupt. Unfortunately redditors are willing hosts to retard shit so parrot "write hole!" "write hole!" to look smart. This also only affects raid 5/6 and I think (?) even ZFS has this ""issue"" but they simply acknowledge it and mark 5/6 stable anyway. The only real usability issue with raid 5/6 at the moment is slow(ish) scrub speeds.

BTRFS killer features: One of the killer features of BTRFS is that you can add any amount and any size of disk to an array and it will just work. You can even change between raid levels on a whim! Do you want 4x3TB + 1x8TB + 2x14TB + 1x20TB + 2x24TB + 1x320gb (for lols) in raid 5? Just werks. You can also add or remove them at will while the array is online. Another killer feature is subvolumes. Do you want to literally never have to mess with LVM or partitioning anything ever again? Do you want all your "partitions" to share all the avaiable space transparently? Do you want to be able to take atomic snapshots of your "paritions" instantly at the cost of literally no space? Do you want to send delta updates of these snapshots to your backup server? Then subvolumes are for you.

Compare this to adding disks in ZFS:
Right, but adding a vdev to a pool doesn't add it to the stripe. It doesn't actually make it part of the RAID. You're just duct taping multiple arrays together, which can have highly unpredictable effects on reliability/performance. There's a VERY recent feature (past few months) that allows you to add a physical disk to a RAID vdev.

BTRFS Downsides: In the interest of fairness there are a couple downsides that I should mention. I already mentioned slow scrub for raid 5/6. Defragmenting if you have snapshots will cause them to take up more space. Performance generally is a bit behind other file systems but not enough for me to notice. I also think ZFS has more fine-grained control of stuff like performance profile per vdev. But compared to the headaches with ZFS or the lack of features / diqualifiying drawbacks of EXT4 and XFS it's not really a contest.

tl;dr; BTRFS >> ZFS >>>>> XFS >> EXT4
At least until my main man Kent Overstreet releases bcachefs a/k/a btrfs 2.0 (benevolent dictator edition)

Any treason why besides SMR being painfully slow during recovers?
Never use SMR drives. The 10% or whatever savings are not worth the headache. Without getting into the technical stuff in order to modify some block they need to rewrite a lot of data (think 100x) so in some workloads you will seem them slow down to single-MiB/s write speeds.

Horror stories about losing all your data
RAID is not a backup. You will probably need to learn it the hard way once like most of us.

BTRFS on an LVM worked well for me. In particular I enjoy the tools BTRFS offers as part of btrfs-progs, such as btrfs-image and btrfs-convert (though btrfs-convert would be a lot more useful if it had support for more filesystems).
Don't do any of this. Subvolumes already handle "partitions" better than LVM and you might be hosing your auto-repair capabilities if you do it wrong. Also converting a filesystem is a huge risk when you could just remake it and copy stuff from your backup. You do have a backup right?
 
Last edited:
I just delineate old-gen filesystems like xfs and ext from new-gen filesystems like btrfs and ZFS.

There's nothing much to say about the former ones, except that they need crutches like LVM2 to have some functionality of the newer ones (this was what stratos was all about). They're basic but they work.

Btrfs is/was shit but not for the broken RAID5/6, but for how volatile it is. The way it handles snapshots are a mess. The way it handles free space and metadata is a mess. I posted about this in the linux thread some time ago, but I could reliably make it go "Out of space" after making 1 or 2 snapshots, tho in very extreme conditions. What it *does* have going for it is the malleability that ZFS lacks. You can non-destructively and on the fly convert a RAID10 to a RAID1 with 4 replicas if you so wished, or any other setup for that matter. Just set your desired setup, both for data and metadata and rebalance. But that's the only up to it. Performance wise it's shit.

ZFS on the other hand is the Rolce Royce Trent engine of filesystems, except it's made by a Sino-Russian alliance under US sanctions, and it's not plastic enough. If you for whatever reason decide on RAIDZ (don't do it btw, it's not worth it), you're fucked, the only way you're gonna expand is by multiplying drives. It will also NEVER EVER be in Linux mainline because of it's Oracle heritage, unless Larry Ellison has a visit from the ghost of christmas future and has a big change of heart AND the Linux devs stop being stuck up stooges about it. Other than that, you can make ZFS fly - initially you're handicapped by the higher requirement of a modern filesystem, but you can work your way around every one of them and scale it up to workloads that Btrfs can only dream of achieving, without sacrificing functionality and with perfect data integrity.

At least until my main man Kent Overstreet releases bcachefs a/k/a btrfs 2.0 (benevolent dictator edition)

The way it's going now it's more probably that it'll die in a ditch. Kent's a fucking idiot who needs a tardwrangler, but his ego is too big to agree to have one. Which sucks because I honestly thought we'd be out of the ZFS/Btrfs dillema in a couple of years. I really rooted for it
 
  • Feels
Reactions: ZMOT
The way it's going now it's more probably that it'll die in a ditch. Kent's a fucking idiot who needs a tardwrangler, but his ego is too big to agree to have one. Which sucks because I honestly thought we'd be out of the ZFS/Btrfs dillema in a couple of years. I really rooted for it
Gotta keep the faith.
 
  • Like
Reactions: Atlas Sneezed
Can't believe we had a major development in open source, one that promised to to fix the FS situation on Linux, that against all odds was developed by a regular guy with no involvement from troons or child molesters, and he still managed to blow it up by being a stubborn prima donna that refused to ever be told how things are done.
 
  • Thunk-Provoking
Reactions: jeff7989
Damn, all this talk convinced me to study OpenBSD filesystems a bit more and I found out that the setup (softraid5 + encryption) I wanted for a server I'm building is not supported (shame on me for not checking!). Both btrfs and zfs seems to support it though, so I'll probably read more about both of them and go with a linux and one of these.
 
Damn, all this talk convinced me to study OpenBSD filesystems a bit more and I found out that the setup (softraid5 + encryption) I wanted for a server I'm building is not supported (shame on me for not checking!). Both btrfs and zfs seems to support it though, so I'll probably read more about both of them and go with a linux and one of these.
I don't know about OpenBSD, but FreeBSD DOES support layering GELI/GEOM with ZFS.

I ran that way for years because native ZFS encryption didn't exist outside of Solaris, but switched pretty much the day I could. So that should tell you how it went.
 
  • Informative
Reactions: Zeftax
I don't know about OpenBSD, but FreeBSD DOES support layering GELI/GEOM with ZFS.

I ran that way for years because native ZFS encryption didn't exist outside of Solaris, but switched pretty much the day I could. So that should tell you how it went.
As far as I could find, OpenBSD only has dos and UFS with some relatives for the root, FFS2 being the most modern one. Can mount a few more but not many.
/sbin/mount_cd9660 /sbin/mount_mfs /sbin/mount_ntfs /sbin/mount_vnd
/sbin/mount_ext2fs /sbin/mount_msdos /sbin/mount_tmpfs
/sbin/mount_ffs /sbin/mount_nfs /sbin/mount_udf
I remember reading something about FreeBSD that made me not want to use it in the past. I think I'll go with Gentoo Linux, I already paid for the processor cores so may as well use 'em! :lol:
 
Back