Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, December 21, 2011

Stick a stateful server in front of that database

I'd like to suggest an alternative to the traditional model of using a big database as the rendezvous point for data in a system. I'm primarily thinking about ticketing here, but this applies to plenty of other things. A system which drives a bunch of kernel tests would also apply here.

I see how these systems come to be with the "database rules all" approach, but sometimes you need to think about moving beyond it. It starts innocently enough. You figure out a schema which will let you store your information, then you write a bunch of code to read and write from that database. Then this maybe turns into a bunch of CGI programs, or $interpreted_language scripts, or whatever.

The point is, this model works like this: you wake up, you parse your parameters, then you poke the database. It tells you something, you emit something back to the client, then you go to sleep again or quit. There's nothing running between requests. Your data lives in the database and does not have a life of its own.

Think about what this means for a ticketing system. Tech #1 loads up the list of unassigned tickets. Then #2 loads it around the same time. #1 grabs a ticket. #2 sees it as unassigned and tries to grab that same ticket a minute later, and gets a warning: "this ticket is already assigned to #1".

This used to happen to us all the time, particularly since I was unforgiving about hitting the reload button in my browser instead of waiting for it to auto-refresh at the lowest permissible interval (2 minutes). When something popped up, I'd snag it. Then usually I'd hear something like "aww, I wanted that one!". Too slow!

Why did this happen? Easy. The ticket list page is static. Why is it static? Well, it's probably because there's no way to get dynamic updates from it. Their whole model is based around "run a query, build a page, emit it, and die". Nothing stays around to keep tabs on things.

So now let's try a different model. There's some kind of server process running which is keeping track of the active tickets. Those queue list requests from tech #1 and #2 come in to that server, and it just serves them out of RAM. Maybe it still throws back HTML, or maybe it gives them JSON and the clients render it. It works out the same.

Now tech #1 grabs the ticket. This happens via an AJAX request back to that server, and it updates both its internal storage and the backing database (you didn't think I was talking about all RAM and no disk, right?), then it sends an acknowledgement.

This is the point where the magic can happen. The server knows who's looking at a ticket list which includes the one which was just changed. It can now deliver some kind of lightweight update to those clients. Whether that happens over some kind of long-poll scheme, or if it queues up until they check in isn't too important here.

The important part here is that tech #2's browser finds out the ticket has just been assigned to someone and it can update the display. Tech #2 will see this and pick another one instead. The window in which they can conflict with each other has now gone from the 2-5 minute refresh interval on the ticket list down to (hopefully) the order of tens of milliseconds.

All you're doing here is moving the aggregation of data up one level from the database into a smart application/server. Instead of having to issue tons of dumb redundant SELECTs every time someone reloads a ticket, just serve it out of memory! You can then drop any ticket out of memory after it hasn't been touched in a while.

Doesn't this sound so much nicer than having dozens of URL end points, each for a slightly different action that can happen through a web page? Oh, THIS one is for changing the subject of a ticket, while THAT one is for changing the assignee! Imagine a tree full of little Python programs, or (gag) PHP scripts all over the place. It happens!

Really, what you want to do is send well-defined commands with an argument or two from the client (web browser) to the server (ticket tracker) and get properly-formatted data back. All of this HTTP business in the middle is just fluff.

Will I still build the simple stateless URL-to-database stuff from time to time? Of course. It's easy, and it's a fast way to start messing with a new idea. Still, there's a point here: before such a system gets to the point where it assaults my database, I'll have already started on the stateful server.

I've seen the alternative. They just kept trying to make this one poor little box running Postgres scale up and up and up. At some point, you have to try something different.

When it takes 45 seconds to load a list of tickets, you have failed.


December 24, 2011: This post has an update.