Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, October 25, 2011

You've heard of "yes men". Now learn about "no men".

It seems that some companies start accumulating people who say "no". All they do is stand in the way of the things which other people want to do. Some of them are just lazy. Others want to show off their apparent power. I think they all need to get a clue.

One of these people is so notorious that someone created an extension to the Linux kernel which did nothing but emulate his responses. You could cat a file in /dev (or /proc, I forget which), and it would say either "no" or "it's too risky". The file was named after his username.

What's really amazing is that this person apparently approved the change to the kernel which added it! That is, despite the supposed commitment to quality, this completely obvious change slipped right past him.

This guy's reputation started gaining a life of its own, and it even affected me once. There was a point where we had a bunch of machines running database servers which would get stuck now and then. Their replication would just stop happening and it would page us when it got behind. The "fix" as conveyed to me by the people already on the team was "kill it and force it to re-copy the data from another node".

I got tired of doing this stupid monkey stuff and started poking around one of the sick machines. I noticed it had no swap. It also had nearly the same amount of physical memory as the size of its database. I couldn't really quantify it, but the machine just felt wrong. It was like Linux needed some temp space to be able to shove things around, but without any swap, it had nowhere to work.

In yet another example of people not believing my intuition, I finally had to just turn on swap to show what happened. Just by doing that, the machine was able to catch up without doing a single thing to the actual database process. It started running again and actually got all of its replication backlog cleaned up all by itself once I added that swap.

Here's where that guy's reputation enters the picture. Someone said "so-and-so says no swap in production". Did I care what he supposedly said? No. They repeated it.

I finally just told them: "If he says that, then he can have my pager and deal with the pages. Until then, I'm turning on swap any time I get paged for one of these things". They wisely decided to leave me alone.

I later found out that all of our older machines had swap enabled. It was just the ones which had been replaced or reinstalled which had been popping up without it, perhaps as part of a policy change. That means it was perfectly valid for us to run with swap on, since most of the machines we owned were using it.

Those guys were willing to just parrot "Mr. No's" policy without understanding anything about what was actually going on. Then again, considering their answer to a situation was to just restart it over and over instead of investigating, this should not come as a surprise.