Writing

Software, technology, sysadmin war stories, and more. Feed
Sunday, February 10, 2013

The corporate "buckets of money" problem

Back in Friday's post about "downselling" bad products that I didn't want to support, I had a reference to the "buckets of money problem". I realized today that I probably haven't adequately explained what I meant by this. A trivial web search doesn't turn up anything useful, so it's probably something I picked up locally which hasn't been widely shared yet.

I learned this particular term in my most recent stint in corporate America. Generally, it refers to the practice of pushing problems around in order to fix your own budget problem while screwing over someone else. This seems to start happening once a company gets big enough to where it is no longer a single entity with a shared goal but instead is more like a series of warring fiefdoms all jockeying to get their slice of the pie.

I'll try to provide some examples so this can make more sense. Imagine you're at a company which has a big process for performing some task in software. It might be calculating the company's financials, compiling source code into object files, or rendering the latest CGI movie with characters who raise one eyebrow.

Everything seems to be running fine. Then, one quarter, the people who run the processing pipeline stuff decide to make a change. They are no longer going to run caching servers to maintain copies of intermediate files which are generated during a job. These cache files must now be stored somewhere else, or (more likely) they won't be cached at all. They will have to be rebuilt every time someone encounters it.

By dropping this requirement in their systems, they no longer have to maintain a bunch of storage nodes and the personnel required to run them. They get to chop out a big on-call system and all of the other infrastructure which was propping it up. They might even get to sell off some assets which are no longer needed.

Of course, that's not the end of it. Right away, all of the people who run jobs on this platform find that they take much longer to complete. In addition to that, their workstations now become noticeably slower under the load. Previously, you could go off and do something else like browsing the web or reading your mail. Now that the workstation has to recalculate all of this stuff every time, it's starved for CPU time and this makes the web browsers and mail clients drag. The box is now basically unusable while a job is running.

One way to fix this would be for some other team to provide storage space for caching these results. This would still make someone pay the "lag" price on their workstation for initially calculating something, but then everyone else would benefit from it. It might seem this was okay since it would "spread out the pain" and no one person would seem singled out.

Of course, the people who wind up hitting something like this the most are the ones who are out in front. They are the first to use the newest input files, and thus encounter the not-yet-cached situation all the time. Maybe they get to the office early, or they stay on top of their workload so they aren't always running several days behind. The point is, these people now suffer for being on the leading edge, and they wind up taking the hit for everyone else behind them.

There's an obvious cost here, which is for this new storage team to provide oodles of disk space. Someone has to buy all of those "filer" boxes and wrangle Kerberized NFS and everything else so that these cached files can live somewhere. That sort of thing isn't free.

There's another cost here, though. Did you spot it?

I'm talking about the opportunity cost for the engineers. Their toolset has been degraded, and now they can't get things done nearly as quickly as they did before. They also now have to deal with a bunch of extra hassles like a workstation which gets so slow that they can watch their windows redraw themselves. If things get really bad, they can make abstract art by dragging things around because the machine is too slow to clean up after all of the old window positions.

This is a cognitive load which never existed before. It's one of those small things which might annoy someone just enough to knock them out of the zone more often than before. If a lot of "luck" with building successful things is being in the right place at the right time, then anything which kicks you out of that mental sweet spot is effectively the same as having really bad luck.

But hey, at least the pipeline processing team got to improve their numbers! Not having to run that whole caching infrastructure gave them a serious uptick in their numbers! How cool is that?

Who cares if some other groups are paying more as a direct result. They don't affect our numbers, and our numbers are awesome! Go us! Trips to Vegas for all of the managers!

So, there you have it: buckets of money in a corporate environment.