Writing

Feed Software, technology, sysadmin war stories, and more.

Friday, December 9, 2011

Replayers, investigators, and ramrods

I made a lot of observations about the world of tech support during my time in it. One of them pertains to exactly what kind of role a person will take while working on a problem. I suspect it applies to other places as well.

Let's say you're a customer at a web hosting company. You have a small business in which you resell access to your server to other companies. They host their web sites and run e-mail on your system and pay a small monthly fee for it. Your knowledge of the system doesn't run much beyond the actual control panel. You know it's running on some kind of Linux flavor, but anything which might go wrong down there is beyond what you can handle. That's when you call in tech support.

The question is: exactly what kind of tech support person are you going to get? My assertion is that there are three main types.

The first type, which is the most common in my opinion, is the "replayer". This is someone who knows about certain problems which can arise and has a cookbook or cheat sheet of solutions. Their approach to a trouble ticket is to first see if it looks like anything they know about. If they recognize it, then they try to apply their usual solution. If that seems to work, they call it done.

For simple problems, this works out fine. A banal request like "please reset my control panel password" can actually be translated into a tech logging in and running a single command. Granted, this sort of thing which is that simple should be handled in software without any people ever being involved, but that's a rant for another time.

My "encore" system which associated old tickets with a few keywords was essential for this kind of replay work.

If it's a complicated problem, odds are, this is not going to fix things. Also, if it's actually something else, this could cover up the true problem or even make things worse. This tends to lead to repeat calls or tickets and angry customers.

Now for the second group of techs, which is much smaller than the first. They're the ones who wind up receiving the escalations when mere replaying fails to fix a problem. I call them the "investigators". An investigator might have to start with a vague description like "my web mail stopped working overnight". Other techs might have tried cargo-cult replay techniques like restarting Apache or rebooting the server, but none of that helped. An investigator has to go beyond that.

Instead of just trying random things, they have to start peeling away layers of software indirection to get to whatever's actually happening. This may involve a journey down through five or six layers of "why" just to find out that you ran out of (usable) file descriptors.

Then, as they re-surface, they have to work out some kind of solution which will make sense both in the short term and the long term. Finally, it is important for them to leave documentation behind so that future replayer techs can come along and repeat the fix on their own.

Now, investigator types tend to be driven crazy by the amount of pointless "replay" work which comes in and try to automate it into oblivion. Anything you can do, the computer can do for you, or so the saying goes. They'd rather spend their cycles building something which would totally remove all "please reset my password" tickets from the queue for all time.

Finally, we come to the last major group of techs. These are the people who aren't willing to just sit there and replay things, and they aren't good enough to investigate issues properly. I call them the "ramrods". They are characterized by a certain amount of hubris which then leads them to rip into things without any real sense of what's going on.

A "solution" from such an individual tends to be completely unfounded. They will blame things with no evidence to support it and will recommend courses of action which have no effect at best and a negative one if you're not lucky. All the while, they are running around thinking they are one of the chosen few.

Fortunately, these individuals seem to be the smallest group, but they are also the most dangerous. Just one of them hidden in your organization will create playbook entries which send your armies of replayers down the wrong road. Worse still, if they are allowed to persist and are rewarded, your actual investigators will get fed up and leave.

As you can see, this might actually lead to a vicious cycle in which the bad ones keep on thriving and the good ones burn out and bail out.

A final note: this list is not meant to be exhaustive. There may be a few other little niche categories which I have missed. Also, people may span worlds, particularly if there is sufficient complexity in their job to where they might not handle all of it at the same level. Finally, people change, and hopefully for the better. Mentoring can play a big part in this.

Ultimately, though, it's not about the tech. It's the people.