Writing

Feed Software, technology, sysadmin war stories, and more.

Monday, May 21, 2012

Refuse to be mediocre

This is a tale of infrastructure inspectors. It's an analogy.

Let's say there are these devices which you can roll down railroad tracks which look for anomalies. Maybe they make sure the rails aren't too far apart, or too close for that matter. Maybe they look for welds which are going bad, or places where it's excessively bumpy. They might also make sure the crossing arms come down and go back up at the right times.

Anyway, you have a whole fleet of these things, and you regularly send them down your network of rails to check things out. Over time, you start noticing patterns: the reports tend to cluster. On any given span, most of your detectors report the same 7 or 8 anomalies. But, once in a while, there's a detector which will find 30 or 40 of them. This happens on the same section of track, at the same time of day, with the same weather, and all of this.

You can even run both of them down the rails at the same time, and the one will always get 7 or 8 anomalies and the other will always get 30 or 40. The results are consistent. It doesn't matter what you do.

At this point, you might stop right there and say that one device must be broken. It's reporting far too many anomalies. The other ones aren't seeing these things, so they must be false positives. Indeed, if you stop there, that must be your conclusion, so you write it off and move on with life.

But let's say you actually get out there and walk the track yourself. You do this armed with a copy of the reports, showing both the kind of failure and the location. As you start manually reviewing the logs, you notice something: these anomalies actually exist. They are subtle, but they are genuine. Your "extra sensitive" detector, for whatever reason, is able to pick up on these things while the others did not.

The problem with life is that few people go to the trouble of "walking the track" (to continue my analogy) to see whether those extra reports are real. Far too often, it's just assumed that they can't possibly be accurate, and the measurement device is flawed. That one device might be labeled as "hypersensitive" or "annoying" or just "frustrating", even though ... it's actually correct.

I am reminded of tales from science class where a series of experiments occurred over many years, and every time, people would pick and choose the results. They did this because their results could sometimes be so far off from what people had gotten previously. It turned out they were all wrong, and it took years for things to slowly converge on the real values.

Just because something is an outlier doesn't mean it's automatically wrong. You have to do the verification work to be able to tell that.

Of course, this is an analogy. There probably are machines which report on the rail status, but I just made that up for my own purposes. What I'm actually talking about is the treatment of people who find more anomalies than others in production systems. They are frequently under-appreciated at best and actively resented at worst.

Sure, maybe they "make other people look bad" and "blow the curve", but if it's all rooted in fact, then who's actually wrong here?

Maybe this is why there is so much mediocre work out there: people are afraid of looking like an outlier, even if the results are genuine.

All I can say is this: stop being afraid to tell the truth.