Writing

Feed Software, technology, sysadmin war stories, and more.

Thursday, June 14, 2012

Try stuffed animals as level 1 support

I am a big fan of having a stuffed animal (or equivalent) as "level one" tech support. I picked this up from the scary devil monastery at some point in the '90s, I think. Here's how it works, assuming a college lab helpdesk environment.

Let's say you're a student who is having a problem with something. You walk into the help desk area of the labs and there is a sign and a big stuffed bear. The sign instructs you to take a seat and explain your problem to the bear. You think this is silly and try to walk past it, but the first human you encounter (with a "level two" sign) says "Hey! You can't escalate until you talk to level one!".

You look at them a little strangely, but decide to humor them anyway. Maybe this bear is wired for sound or something, and some distant person will talk through it. Who knows. In any case, you walk back out front, sit down, and start talking to the bear.

"Bear, I have a problem. It seems that when I try to run my program, sometimes it just sits there like it's not reading anything, but it has to be reading something, because ... hmm, uh, wait a minute. Oh. OH! Never mind!"

The very act of explaining it to someone else jostled things in your head into a new arrangement even though it was just a stuffed animal. From this new perspective, you spotted something that hadn't been obvious before and hadn't been considered as a source of problems. Upon realizing that (around the "..." up there), you started pondering it and saw that it might just be to blame. This is when you dashed back to the lab to give it a shot.

This is very real. Talking through things means you have to turn around and lay down enough of a foundation for someone else to get it, assuming you possess those basic social skills (and I think most people do). This can challenge enough of your assumptions to give a few more leads.

I see this happen all of the time. Sometimes, I am the "stuffed animal". It manifests over my chat sessions with my friends in various other parts of the industry who are usually working on coding something.

My most recent example involves one of my friends and a crash report aggregator. He was looking through the list and found something that seemed really obnoxious. It didn't have much in the way of details, and even when he did manage to turn it into a dump which included symbols and a stack trace, nothing in there implicated his actual code.

There was the usual blob of library calls, but his code wasn't at the top of the call stack like you would expect. That seemed impossible at first, but I asked if perhaps it might be happening in another thread. If that thread is kicked off by the system and/or libraries, then it'll just be out there doing its own thing. That thread's call stack will trace back to the clone() which gave birth to it, and nothing else.

If this seems somewhat familiar, it's because I listed it as one of those things which can happen in a post about closures and callbacks last November. Your call stack ends at the point where the thread came to life.

My suggestion was that he was doing something in thread A that was causing thread B to die. Thread B was entirely owned by the system and was not running any of his code. It might be a library, or framework, or who knows what. You have to keep it happy but you don't have any real visibility into what it is.

That was enough to get him looking at that possibility, and it turned out to be right. There's a system service he uses which is not thread-safe, so you have to be very careful about when you call it. Apparently he had already corrected for this in one place but still had a secondary use which did not use the "call from main thread" convention. That was enough to trip it up at a bad point.

The interesting thing here is that I basically know nothing about the program in question, the operating system it's running on, and even the language in which it is written. All I basically know is that it's kind of Unixy, with threads and libraries, and can provide a stack trace when things bomb. That was enough to dig into my cache of war stories and come up with a possibility. My friend's experience did the rest.

So there are stuffed animals, and then there's me. I'll listen to your story in the hopes it will help. Unlike the bear, however, if you don't twig onto something just by explaining it, I might have a few things to share which will get you the rest of the way.

Have a tough problem? I am available for consultation!