Writing

Software, technology, sysadmin war stories, and more. Feed
Thursday, December 6, 2012

"On call" was really more like "on duty"

In my days of working as a pager monkey, it was common for teams like mine to have "on-call" rotations. We had three kinds of shifts for people on this side of the world: morning, evening, and weekend. Morning was from 9 to 6, evening was 6 to 1, and weekend shifts were 24 hours starting at 1 AM.

You might think that such a life would be relatively unburdened. After all, you can be event-driven with a pager. You can go off and do other things and as long as you are within a reasonable distance of your computer with an Internet connection, you're free, right? Maybe, maybe not. It depends on just how bad the situation is when you walk into it.

If your team was like the one I joined, your pager would go off somewhere around 40 or 50 times during that morning shift. That's about one page every 11 minutes during the whole day. That means no matter what else you might have to do, you couldn't possibly get any flow going since it would pop up and interrupt you. This wasn't just a one-week anomaly, either. It went on like this for ages. You couldn't even go to lunch without bringing your laptop along since you'd normally need it.

The other pager-holding types on other teams at that company used to make fun of my team. They said we had to bring the chargers for our Skytel pagers in because they were in constant use. It sounds mean and snarky, but it was actually true. In addition to schlepping my laptop and cellular data card around, I also tended to bring my pager's charger to work with me, and then brought it home at night. It wouldn't last very long the way we kept using those things.

All of this got me thinking about terms used to describe this. What we had wasn't "on call". That implies a state where normally nothing much happens, and a page is an unusual event. Instead, we had an "on duty" rotation where it would be your turn to be whipped senseless by annoying monitoring systems for several hours during your shift.

I took this pretty seriously, so when I had an AM shift, I made sure I was in the office by 9 and didn't leave until after 6 when it rolled over to the next poor sucker. This meant I got to drive in during the lovely 8-9 AM hour and got to drive home in the 6-7 PM hour. Both of those times on highway 101 are absolutely miserable. It can take you 45 minutes to go 10 miles if things aren't just right.

Then there were the PM shifts. Since I could basically guarantee spending most of my night dealing with the crying baby that was our production service monitoring, I decided they weren't going to also get my day. I started putting in my working hours in the afternoons and evenings at home so I'd have at least a little freedom during one of these weeks. This had the interesting effect of letting me go out to shop during the day, but I'm sure my absence in the office during normal daylight hours had negative effects on my overall assessments.

Weekends were generally miserable since you really couldn't go and do anything for a whole day. You just had to know it was coming, stock up on supplies in advance, and wait for it to happen. Then once 1 AM rolled around once again, it became someone else's problem.

I'm pretty sure that getting fed up with this and attacking some of the problems head on instead of writing them off as "transient" eventually lowered the page quantity. I don't have good numbers for what happened after I got a little confidence at that job and started making changes whether the other people liked it or not, like turning swap back on, or reconfiguring storage nodes to avoid contention for the primary disk.

Pages should be unusual. If you're drowning in them, you're doing something wrong.