Software, technology, sysadmin war stories, and more. Feed
Saturday, July 27, 2013

Good programmer with good libraries got something done

There's a slide deck going around describing how a C++ program written in 2007 did a number of bad things, and didn't hold up under load in 2012. The replacement, written in Go in 2012, eliminated a great many problems and simplified operations. It also brought the code back under the "maintained" umbrella, whereas previously this service apparently had gone unmaintained and unloved, despite having countless active external users.

I never heard of it before the other day, for the record - not the original implementation and certainly not the replacement. I don't even know who wrote the original version.

The C++ vs. Go part of it doesn't seem that interesting to me. What interested me more was that the original code seems to have gone to lengths to handle many things by itself. It's almost like there was no HTTP server code available as a library. Of course, that would be quite a claim, considering all of the other things which are built on top of HTTP and are built in C++ from that same code base.

Why did the original author or authors do that? Who knows.

The first part which really grabbed me was one bullet point: "C++ single-threaded event-based callback spaghetti". Ah yes. I've seen that before.

This makes me wonder: just what sort of problem is this, exactly? On the surface, it sounds like something that just takes a bunch of HTTP requests and pumps out data. I imagine something which figured out the path from a GET against its local mapping of paths to files and then just threw bytes at those file descriptors would be a good place to start.

If you started from absolutely nothing and had no help from libraries and wanted to do this the fairly old-school way without things like epoll, what would this look like? It seems like you'd probably write something which does the usual socket + bind + listen thing to create a listening file descriptor on some TCP port. Then, you'd have a main loop where you create a fd_set and put that listening fd in there before calling select. (More will be added here, but sit tight.)

The return code of select dictates what happens next: if it's less than 1, something potentially bad happened, but it might just be an interruption from a signal. You might need to bail out, but odds are you just need to "continue" and restart the loop. If it returned 0, then nothing was active, so you restart the loop. Otherwise, it's time to see if the listening fd is active.

If that's the case, then you get to use accept (or similar - see accept4) to turn that incoming connection into a fd, and sock it away in some kind of client structure. This means the first part of the loop now grows to add the client fds from this structure to the fd_set, and the middle now grows to check those same fds for read activity much as the listen fd is checked already.

If one of those client fds should have activity, then you have to read from it and see what happens. If it returns less than 1, something failed and/or the client disconnected, so you get to deal with that - call shutdown, close the fd, and clean up the client state info. Otherwise, it's time to figure out what they sent us. This means accumulating enough characters to be able to parse it all out into something usable. That might mean a state machine of some kind - more flags and buffers in that per-client structure, I guess. It might have to hang onto these bytes across multiple read calls, since you might get "GET /" and "foobar.zip HTTP/1.0" in two different calls. That's supposed to be a HTTP/1.0 request for "/foobar.zip", not a HTTP/0.9 request for "/", after all.

Let's say all of this is figured out, and it's understood what the client wants, and it maps onto something which can actually be fulfilled. Now it's time to start sending data to the client. Okay, so you open a file descriptor and start copying bytes, right? Well, yes and no. Unless it's a tiny file, odds are a single call to write isn't going to cut it. There are buffers for network I/O, but they usually won't absorb more than a few KB. Then the write will return with EAGAIN or similar, assuming you set nonblocking mode -- you did remember to do that, right?

Now you have to remember that this particular client connection is in "push file mode" so you can come back to it a bit later. This also means changing up how you call select() in that main loop, since you probably want to have it tell you when those fds for clients in the "file sending" mode become writable. Otherwise, you'll end up making a bunch of pointless "would block" write calls over and over, and that would mean using select with a zero delay value so you can get back to those write attempts. That sucks a serious amount of CPU, so that's bad.

Anyway, this means another way for select to return 1, and now you have to also check all of your "file send mode" client fds for membership in the writable set. If any of them come up, then you get to (attempt to) push another block, and see if you hit EOF while doing this. If so, then you get to make another decision: do you hang up on them now, or do you somehow reset for another request? That is, do you support things like HTTP/1.1 pipelining? If so, how did you deal with bytes coming in from the client while were you were still pushing that file to them? Even if you don't do pipelining, you have to watch out for what you wind up doing with that input side of this connection. Closing it down early might have suboptimal results.

(This might be a non-issue depending on what sort of frontends are sitting between you and the user. Cough.)

At this point, in theory, you have this big round-robin thing going on. It looks for new connections and adds them. It looks for bytes waiting to be read from clients and reads them and feeds them to a parser of some kind. It also potentially pushes bytes out if those connections have gotten that far, and handles whatever happens when it finishes sending. It also cleans up when the work is done or the connection goes away abnormally.

Let's say you do all of this perfectly, and it manages to support all of the specs which are required for this server project. With all of this work done, there's still something out there, taunting you. This thing is only ever going to run on a single CPU. It might eat that entire CPU if you give it enough work (or program it badly), but it can't possibly spread out to use others. If you want to use the other processors on your machine, you're going to need to run multiple instances of the same code somehow.

What now? Well, threads are one way, but you can't just throw threading into this situation without creating a big mess. Threads by their very nature share memory because they're all in the same space, so you have to be very careful about data updates. How do you delete a client from the state structure without interrupting someone else who might be iterating over it? You get the idea.

Forking might give you some options. If the program starts up, creates the listening file descriptor, and then forks, I imagine all of the children would end up getting different clients. That is, only one child will manage to "win" the accept() call for a new incoming connection. Of course, the others might end up waking up in select(), only to have accept() fail since someone else got it. It's the "thundering herd" thing all over again, and you'd have to figure out exactly how your system and implementation would handle this.

So now you start thinking about having just one process run accept(). It sits there and waits for new incoming connections. Then, once it has a viable file descriptor, it can pass the file descriptor to one of its children. Yep, this is actually possible, if you use a Unix domain socket and do some ancillary data magic on a system which supports it. This changes the children so that instead of having the real listener fd, they now look for activity from their parent, and read those messages to acquire new clients.

If you did all of this correctly and balance it sanely, then you have a decent chance of fanning out your incoming connections to different processes, and you might actually get to use more than one CPU for your serving duties.

It also means you have duplicated the efforts of a great many people who have come before you, possibly even at the same company. Their code might even be right there in the depot in a sufficiently generic form. Unless your needs are extraordinary, odds are you can probably get things done by using those libraries. If it's not a direct match, you might even be able to add those features by submitting a patch or just by wrapping it with some of your own code.

They've probably already solved the parallel-HTTP-serving problem. I mean, if you work at a company where the web is their lifeblood, you'd think there would be a good solution to this already. Otherwise, how else is everyone else getting anything done?

That's really what this is all about. It's about having solid, useful, and proven libraries, knowing about them, and using them. The replacement project here had a single person at the helm who knew about a good environment and went and used it to good effect. The original project may have had multiple programmers who may or may not have needlessly reinvented the wheel -- it's not clear exactly what happened there.

Without a lot of internal proprietary information, it's impossible to say how the original program came to be or why. I've done post-mortems for such things in the days when I had access, and it's no small task.

Headlines: Smart programmer who gets things done got something done. Good libraries trump one-off half-baked local implementations. Language X comes with good libraries to do these tasks. Language Y does not.

Not the headline: language X rules, language Y drools!