Writing

Software, technology, sysadmin war stories, and more. Feed
Thursday, July 12, 2012

Smart systems for smart users

I used to rely on an alerting system which was somewhat but not entirely integrated with a ticketing system. I didn't have any real input into the dev process, so problems which came up couldn't really be fixed. Support monkeys like me weren't considered important or worth querying about these things, after all.

One of the quirks of this system was how it dealt with attaching alerts to tickets. Basically, alerts were their own thing. They would appear on and disappear from the "alert console" based on what the monitoring systems found. If your machine stopped answering on port 80 and you had "HTTP monitoring", it would raise an alert. Whenever that port started answering again, the monitoring system would then close that alert.

Humans could not close an alert. The only way to close it was to either fix the problem (make the port answer again) or turn off that flavor of monitoring. This meant alerts would stay in the console until something conclusive was done about the problem.

People were expected to watch the alert console for new alerts and then either create a new ticket for it or attach that alert to an existing open ticket on that customer's account. The problem is that this association was one way: alerts knew about tickets, but tickets didn't know about alerts. Can you tell that alerts were written later by a different group of people and were considered second-class by the core set of developers?

Since any given ticket had no idea about alerts which were linked to it, you could do things which would effectively orphan an alert. For instance, you could just close the ticket. It didn't know so it didn't care, and nothing would stop you.

So now you had an alert console which showed an alert, but it showed a ticket attached, so naturally there's no reason to worry about it, right? Whoever owns that ticket must be working on it.

Of course, then you'd go and look and find out that the actual ticket is sitting there in some dead state and the customer's service has been down the whole time. Sometimes this would go on for hours, or even days, if I had been off for a while.

Why do I mention myself in this? Well, as it turned out, nobody else really ever looked at these things to find alerts attached to closed tickets. When I was busy with other things or otherwise wasn't available to watch things, they just rotted.

After a while of this, I realized that something had to change. The system was designed to be insanely flexible, and you could tell it to do lots of things. The problem is that it would follow what you wanted and it could get into a bunch of nonsensical states. There were hardly any guard rails to keep you from going off the road.

Then it hit me: the system was designed by smart people for smart people. The average intelligence at the company had been nose-diving for a while now, and more and more problems of this sort were becoming apparent. I figured that the proposals being kicked around to replace the in-house systems with some horrible shrinkwrap that would only work in IE on Windows might actually make sense.

Sure, those systems can be horribly restrictive, but when you're purposely going out of your way to bring in cheaper people who can't handle things by themselves, what else are you going to do? Limit their options and you can probably limit the damage they can do. Of course, you might also create more messes by doing this, but as they say, "that's someone else's problem".

I'm not convinced this saves any money, but it does let a bunch of silly business people check off a few goals every quarter and look useful.