Software, technology, sysadmin war stories, and more. Feed
Thursday, March 22, 2012

Reader feedback: cargo cults, configs, and bad UX

It's feedback time once again.

Sunday's post about snarky comments in config files and having to restart the entire web server yielded a few comments. One reader latched onto the "restarting fixes it" notion and called it out. They call it "cargo cult problem solving", where if something doesn't work, you start by rebooting everything. They further note that this is step #1 for anyone who's suffered through Windows.

I agree with this comment in so many ways. First, I hate the notion of restarting things to fix them. It probably accomplishes little, and it might throw away valuable troubleshooting data. Worse, it tends to create a culture where such things become considered acceptable. My experience with checking out of a hotel and having the phone system's failure explained as "it does that" just infuriated me.

Next, I really love the "cargo cult" label, since it works so well for many of these behaviors. I've actually used it in a couple of posts here to describe escaping chars for in-band signalling and the kinds of things that certain unskilled techs do. Of course, I'm not innocent of this, either. When I had to hold my nose and write a NT service in Perl, I got it working and got the heck out of there. The last thing I wanted was for people to think that I was comfortable using that garbage.

Squeezable toy penguin

As for the Windows thing, I picked up a saying back in the '90s called the 4 "REs" of Windows administration: reboot, retry, reinstall, repeat. Sometimes I added a fifth pseudo-RE which was "Red Hat", but that wasn't entirely serious. In the days before RHEL, I wouldn't run that if I had a choice.

I used to taunt the "network engineers" who ran the NT machines by calling them the "reboot rangers". I also had a plastic squeaky toy penguin which I kept on top of my monitor. Any time I heard someone talking about rebooting, I gave it a squeeze. You could hear it from across the office.

Why a penguin? Well, because I was a Linux person, naturally! It was the least I could do.

Another reader wrote in to say that he once had a serious problem with a system which automatically refreshed itself when a config file changed. Apparently he was working on the network configuration files, and was changing the primary network address. As soon as he saved it, the system picked up the new config and made the config active immediately, killing his ssh. He was able to get back in on the new address since the work had been done correctly, but that could have been very bad.

I would definitely worry about that kind of system. What I had in mind was a system for driving stuff above the OS level, like web applications and databases and things like that. If you can get it to sync up with your source control system, then all changes have to be reviewed and checked in, and you get a nice audit log of everything. It also means that if your server really needs a full restart on a config change, you can just add a post-sync hook of some sort to do that. That eliminates an opportunity for human forgetfulness.

Now, if you worked at a place which already had libraries in place for doing all of this, and you just had to use them to get the benefits, you probably would, right? Not these guys on this project. They wanted to boil the oceans and re-invent all of this stuff by themselves. Talk about hubris.

Finally, there was a fair amount of chatter about last Friday's post on the Apple account situation. A lot of people took that opportunity on HN or Twitter to say how smart they were because they only had one account. To those people, I salute you. Truly, you are the pinnacle of the human race, and the rest of us owe you a debt of gratitude for your account-handling abilities.

It would be one thing if I was the only one suffering from it. That would just mean that I was too stupid to figure out how to fix it. But hey, it turns out that a bunch of other Apple-using folks are in the same boat as me. So that either means that we're all stupid, or maybe, just maybe, there's something broken in this system.

Hint: if a bunch of people all fail to use something in the same way, you can call them "stupid", or you can admit that it's going to happen for whatever reason and design around it. If you don't get this, go read a bunch of Donald Norman's books over and over until it sinks in. The tales about pull bars on doors you should push and decoy slots on subway turnstile ticket readers should get you started.

Full disclosure: I didn't get it at first, either.