Software, technology, sysadmin war stories, and more. Feed
Tuesday, October 27, 2020

Sharp tools for emergencies and the --clowntown flag

It's been my experience that companies tend to have bits of their own lingo. Some of it is just the usual business stuff, like when you find yourself in a ridesharing environment and have to learn what "bookings" are. Other parts are more cultural and sort of evolve organically as people work together over time.

One of them from Facebook was "clown town" or just "clowntown". While there might be several uses, I tended to think of it as an adjective sometimes, almost like calling something "goofy" without calling it straight-up "stupid". Of course, there was also "clowny", with its trailing "y" suggesting adjective use (downy, cushy, wobbly), but that's language for you.

In particular, "clowntown" made it out of the spoken realm and back into the computers in the form of command-line arguments you could pass to certain tools. By using them, you were affirming that whatever you were asking it to do was in fact broken, crazy, goofy, wacky, or maybe just plain stupid, but you needed it to happen anyway. It was a reminder to stop and think about what you were doing, and why you had to resort to that flag in the first place. Then, when the fire was out, you should go back and figure out what can be done to avoid ever having to do that again.

The tools built this way tended to be relatively safe for most uses, but they still had the sharp edges needed when you absolutely had to do something specific with the checks and sanity check removed. Many of them would log the fact they had been run in this fashion, and would then announce it to IRC or something else to let other people know what was going on.

Basically, you COULD run with scissors, but other people would find out about it, and if the organization was doing its job, there would be follow-up to find out why. Then wheels would start turning to (try to) make it never have to happen again. This is what progress used to look like.

I should also mention that we had other flags which were even scarier than the sort that you might hide behind the --clowntown flag. I think one of them was something like --i-want-to-cause-a-sev. It was the kind of thing we put in place to let us deliberately do something ridiculous like "move every machine in production to a new version of a package RIGHT NOW", even though the system in question had been designed to never let you do that in normal operations. We'd rather write the tool, gate it strongly, and never use it than find ourselves in a situation where we actually needed it but couldn't make it happen, especially if the system we replaced would have allowed it (by being relatively unsafe overall).

When it comes to having very sharp tools around for emergencies, it's okay if you never have to use them. This means you've probably built a system in which everything your customers/users need can be fulfilled through ordinary means. That's great!

Along the same lines of "looking for the duct tape", looking for uses of the --yes-i-really-mean-it type flags is a good way to figure out where you are missing something from your system's model of the world. It may be that real life needs something far more often than anyone imagined, and maybe certain things are actually not an exceptional event. Maybe someone needs to spend the time on making it possible to do them more often without compromising on safety.

The last thing you want is to normalize the use of a safety override. Best practices in software aren't usually "written in blood" like they are with "real" engineering disciplines, but they still need to be considered. The number of outages, privacy leaks, data loss events and other terrible things could be greatly reduced if we could just learn from our own collective history.