Writing

Software, technology, sysadmin war stories, and more. Feed
Monday, September 19, 2011

Rendering database snapshots after the fact

I found a way to add "time traveling" to a support ticket system which had no concept of previous views and did some fun things with it. Here's how.

Here's a brain-twister for you: given a database with a bunch of logged events happening to tickets, render an "active ticket" view. Okay, you think, that's not so bad, right? SELECT ... WHERE status = active? Who cares? The ticketing system does that by definition already.

No, the challenge with this kind of data is to be able to create that same sort of view for some point in the past. Basically, if you had been working on support tickets at some arbitrary point in time, what did the queue look like?

Maybe you're thinking "take snapshots", and you might be right, but nobody thought to do that at the time. Someone wants to see what things looked like 9 months ago, before anyone ever thought looking back in time might be useful. Oops. Now what?

So you have your tables which will tell you when various things happened to a ticket. Maybe you have one which is all of your status changes: new, unsolved, closed, that sort of thing. First, find every ticket which changed status in the 48 hours leading up to the point you care about. That means if you want to look at the mess after the Slammer worm hit late on Saturday, January 25, 2003, you need to start looking for status changes on the 23rd and work forward.

So now you have a list of tickets which could have appeared on the ticketing system's screen at the time you care about. But how do you turn that into meaningful data? My approach was to build a bunch of time bucket arrays: one per ticket. Then I'd ask the database to spit out all of the status log entries for that 48 hour period for just those tickets from my earlier list, ordered by ticket id and time.

This would give me things like this:

It would handle one result from the database at a time, so it would start at the 02:30 bucket and populate everything past that to the end with "New". Then it would get the next one and go to 03:01 and populate that and everything past it with "Unsolved", and so on with 05:41 and "Closed".

Think of it as replaying the log to build a timeline of statuses:

----------NNNNNUUUUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCC

Obviously, there were some inefficiencies in this, like rewriting values over and over for the later times, but it did work. Once it finished reading from the database, now I had a whole bunch of arrays which summarized the time leading up to my target. This meant all I had to do was compute an offset and read from every array at that offset to see what was happening at that time.

This technique was also crafted to remove a lot of back-and-forth between me and the database. Our reporting system wasn't the most powerful thing in the world, and while it could do a decent linear read of something like that events table, constantly asking it to run queries would take forever. This meant my approach wound up being faster than the sort of thing you might come up with at first.

I imagine most people who know SQL would want to try something like this:

SELECT status FROM status_log WHERE ticket_id = ? AND event_time BETWEEN ? AND ? ORDER BY event_time DESC LIMIT 1

Assuming your times were (48 hours prior) and (the target time), then sure, that would in fact get you the last status change for your ticket before the target time. That value is the one you display. Trouble is, you now have to issue that query for every single ticket from your original list. That's a lot of round-trips and slow queries! Our reporting database was far too slow for that kind of (ab)use.

Maybe some more clever folks would pull the "WHERE ticket_id = ?" so it would find every ticket which changed state in there. Trouble is, then you have to rework your ORDER, since the way it is, it's going to give you the last status change out of the entire set of tickets, and that's not what you want, either.

Now you have to wrangle GROUP BYs and HAVINGs and that ORDER BY ... DESC so that you get the last one per ticket... and you have to figure out how to make the LIMIT work so you only get one per ticket, and ... and ...

Sometimes, all you really need is a sequential read or two. SQL is nice for some things and is horrible for others. In this case, rather than trying to abuse it, I just had it give me the results of two filtered scans and did the rest by myself in C. It worked just fine.

Maybe this is why it was no big deal when I started working with the so-called "NoSQL" systems like Bigtable. For some data, it's the right way to go.

Historical queue view

Incidentally, it looked a bit like this. I've altered a few details to remove identifying data, but otherwise that's how it looked and worked. You could see just about anything and cruise through time quite easily.