Writing

Feed Software, technology, sysadmin war stories, and more.

Sunday, October 7, 2012

Readers respond to my post about RAID

Wow. Yesterday's post about RAID not always being the answer has generated a good bit of feedback. Here's the summary.

Software RAID was mentioned a few times, as was ZFS. I was given a few examples where you could pull a disk from one machine, drop it into another, and run an "import" tool to pick it up and keep going. Elsewhere, I was informed about the unfortunate tendency to have a mismatch between ZFS versions on install disks vs. running copies, and apparently that means no data is visible. Ouch!

This seems to be another variant on the same basic pain. It also doesn't seem to help with the notion of splitting up things between different machines. Also, personally, there's no way I'm going to hitch my wagon to any horse drawn by (what's left of) Sun, or Oracle now, particularly because I'm running Linux. It would be different if I was already drinking the Solaris kool-aid, but I'm not.

I received a pointer to a few projects. ChironFS was one, but it seems to be relatively idle - no updates on its web page since 2008. This either means it's brilliant and amazingly stable, or is more often the case, abandoned and neglected. It also doesn't seem to be the sort of open-ended system I was talking about where machines can come and go.

Another reader commented about software RAID. I actually ran that once upon a time. I used it for my CD Tower project initially and made a huge virtual block device from two other disks which were also big for their time - 18 GB, woo! This turned out to be a poor idea, since it meant that losing either drive would hose me. It sure looked cool on paper since, wow, 36 GB partitions... but it was unsafe.

I wound up redesigning the system. In the new scheme, only the OS and swap partitions were in fstab. Then I had a program which ran later in the boot process and used /proc/partitions to find everything else. It would run the library call equivalent of "tune2fs -l" on them to look for a magic volume label. If it found one, then it would fsck it, create a mount point for it (if necessary), and mount it. Then it would add all of the images on that filesystem to the global list. It worked pretty well.

Another reader pointed me to GlusterFS about an hour ago, and that actually made for a fascinating read. They seem to have created an extensive toolbox where you can set up storage space on ordinary filesystems and then join them in varying ways to have volumes that are distributed, replicated, distributed replicated, striped, and so on. It's actually a little overwhelming.

While reading through it, something occurred to me: it is, in fact, a toolbox. While they provide a bunch of tools which can be used to make things happen to your cluster, it seems to demand a lot of manual hands-on administration stuff. There are commands you need to run to rebalance things, for instance. I don't understand why you would want to live in a world where you have to keep messing with your storage system. It should be something where you give it a bunch of goals, and it goes about and strives for them with what it has. When it starts running out of resources, it asks you for more, you hook it up, and it goes on. In the meantime, it limps along as best it can.

It's possible I've missed some kind of "overlord process" which runs on top of everything I saw in the GlusterFS docs and then automatically runs things like rebalancing passes and log rotations for you.

This is really random, but in their documentation, they tell you how to shrink a volume in section 7.3 and then warn you that it will make your data inaccessible. That elicits a "what what what?" from me. It turns out that in section 7.4, they actually tell you how to migrate a volume, and thus get your data out of the way. This seems like the sort of thing you'd want people to find first.

It's hard to explain, but I get a really weird vibe from this sort of design. Maybe they're just aiming at a vastly different use case and I'm the one who doesn't get it. I just get the impression that it's going to be a nontrivial matter if a system decides to drop a disk or keel over entirely, and I want that to be No Big Deal.

Am I missing something? Let me know if you think so.