Software, technology, sysadmin war stories, and more. Feed
Tuesday, July 5, 2011

Build to interfaces or watch your critters die

Let's say you have to maintain a system which is in charge of providing water to a vast colony of little furry creatures. Maybe you have mice, and rats, and rabbits, and anything else which will drink from those little tubes. Further complicating matters is that you have thousands of them, and, oh, by the way, you're on the moon. You have to create drinkable water somehow, and there are many ways to do it.

If your critters do not drink at least some minimum amount of water per some unit of time, they will get sick and possibly die. Their activity levels vary depending on the time of day, the season, and just for the heck of it, the phase of the moon. That means their water needs vary, too. Sometimes you can get by with 1000 water tubes, but other times you need 5000. Mostly you're somewhere in-between.

What do you do? One approach, one with parallels in the software industry, is to make a single watering tube. You call it the golden tube, and then you make an elaborate replication system where any changes applied to that golden tube are then distributed as quickly as possible. Let's say you test out a change on your development tubes with your "scratch" critters, and upon approval, you push it to your golden watering system. All of the other ones immediately notice and start changing themselves to match.

But oh no, something's wrong. Your testing environment wasn't perfect. Your watering tubes are failing left and right, and are starting to poison your animals! They're getting very sick. So you hit the panic button and stop the replication. Some systems are still okay, so not all of your animals are affected, but you're in a world of hurt. You start backing it out, and slowly, things recover.

There are a few responses to this. The first time, people might just shrug and say, eh, whatever, it happens. After two or three more times, various critter manager types will probably start calling for "change management" policies. You must have this much approval before you push to production, blah blah blah. Lots and lots of craziness will go into trying to make this system perfect.

While all of this is going on, nobody will realize that they're really not in the business of making a single watering device. They might think that they are, but they aren't. They are really in the business of providing water to the animals. That is the actual interface: a metal tube with clean water and a little stopper ball which keeps it from leaking.

There is another way to go about doing this. Instead of trying to build one huge vertical system which is homogeneous, why not have three or four of them with different techniques? That way, no one error can knock out the entire system at once, and all of those lovely fuzzy creatures who rely on it. It would also give some incentive for teams to try to keep improving on their design, since there would be friendly competition. Without that, your people are doing nothing but waiting for yet another fire to break out. Forget about innovation.

The trick is realizing that you're there to provide a service via a well-defined interface, and the exact implementation is not hugely important to your end users... until it breaks and kills them. If you don't have any alternative implementations and your single system breaks, it's game over.