Writing

Feed Software, technology, sysadmin war stories, and more.

Thursday, February 14, 2013

Corporate IT and the Great White Whale

Have you ever heard about a project which one person at a company really wants to see happen, but it's completely crazy? It probably has no real justification for existing, nobody else in the company wants it, and maybe only one customer thinks they want it, but this person still keeps demanding it? I've seen this happen a couple of times, and I tend to refer to them as the "Great White Whale". It's something the project champion wants no matter what, and reality won't slow them down.

One of these had to do with metadata for a file backup system which was in use. A bunch of machines were running this software which basically acted like a glorified version of the old BSD dump/restore utilities. Instead of dumping to a tape, they dumped to a storage server over the network. Those backend servers had their own (huge) tape drives. This made it so your individual machines didn't need their own local tape drives.

Well, this guy who wound up in charge of this thing somehow decided he wanted to be able to get to every piece of metadata for every file which was backed up by every single machine across the entire company. Now, if these were all actually being used by a single company, that might start to make sense, but they weren't. It was a web hosting concern, so all of those machines were being leased by bunches of different customers. Most of them probably didn't care about all of the finer points of this, just so long as their data was safe and could be restored on demand.

When I say metadata, I mean all of the stuff which goes with a file. First, there's the actual filename. Then there's the backup time, the permissions, the owner, the group, and the size. I'd guess the biggest record might be about 1 KB. Now assume one of these records would exist for every single file on a given machine. What's more, it would also be created every time a file was backed up. If a system had nightly backups, that could be quite a bit of data. Multiply this by the thousands of machines running this thing and now you're talking about a rather large collection of stuff that's constantly changing, since backups basically run continuously to spread out the load.

I figured the way to proceed with something like this would be to build just enough of a storage system to where customers could be added as they requested it. If someone wanted to join, we could look at how big their "save set" was, and figure out how much more metadata this would introduce. Then we would either add them to the system or add more space to it. This way, it would scale up according to actual need and not some hypothetical situation where everyone buys in.

Captain Ahab didn't want this. Oh no. He wanted the whole thing to go online with enough capacity for everyone on day one. That's a lot of data. Keep in mind this was many years ago, so you couldn't exactly walk into a store and pick up a 2 TB hard drive. I think the biggest disk you'd usually see was in the 20-40 GB range in those days.

Someone guessed it would be take about 60 MB of space to store the metadata for one full backup for one machine. If you figured approximately 5000 machines were being backed up, then 60 MB * 5000 machines is about 300 GB. Then remember they do this (at least) once a week, and that all of this data had to stick around for at least a quarter. 300 GB * 13 weeks is 3900 GB, or almost 4 TB. Even today in 2013, a lot of people would probably balk at setting up a 4 TB SQL database!

So, you need storage space for all of that data while it's at rest. How about moving it around? Shipping that data from one place to another means bandwidth, and again, there's only so much of that to go around. While a web hosting company has tons of bandwidth, most of that is for the customers. It can only realize so much bandwidth between its far-flung locations, because those are linked together with VPNs. The company did not have its own fiber backbone or anything like that, so all inter-data center traffic happened through a tunnel over commodity transit providers. Adding this traffic to the tunnel would probably make life worse for all of the other people who had to rely on it.

I figured they could also keep the data localized. If a customer had a server in data center X, then they should store their metadata on a machine in that same location. At least that way it would keep all of that stuff off the VPN links. Sure, they'd have to write their report generation stuff to effectively (gasp) dereference a pointer to one of their storage nodes, but so what?

As this project started grinding along, I heard more and more strange things. They wanted to encode path elements so it would "take less space". Yes, instead of having /usr/local/bin/foo and /usr/local/bin/bar in their blobs of data for each file, they might try to reduce it like this: "12 = /usr/local/bin", "[12]/foo", "[12]/bar". I don't think the people in question knew anything about how DNS works to recycle tokens, but it almost sounded like they wanted something like that (RFC 1035, 4.1.4, if you're curious).

I came up with an interesting failure mode for that. What if you knew they were doing that and wanted to be deliberately annoying? All you have to do is create a bunch of files with crazy unique paths so that it pollutes the "dictionary" which says that [12] means "/usr/local/bin". Now the dictionary will have thousands of entries all from your machine. Then, you delete your files. Who goes back and cleans up the dictionary of tokens? Do they write database triggers to do this? Do they have some kind of "vacuum" equivalent to painstakingly look for unused tokens? Or, do they just let it grow and grow without bound?

As far as I know, the Captain never landed his Whale. He could have had a little minnow version of it right away and could have grown it to be his full-on vision if the market had existed, but he wouldn't settle for that. It had to be all or nothing.

He got nothing.