Writing

Feed Software, technology, sysadmin war stories, and more.

Monday, October 12, 2020

Binpacking little things onto big things with PAE

This is a story about something that we used to do a long time ago to make the most of our hardware. I certainly hope it's not affecting anyone these days, but you never really know. It has to do with memory addresses, bit counts, CPU architectures, and operating system details.

This is over 10 years ago, when 64 bit Intel and AMD machines were becoming a thing. The place where I worked hadn't quite gotten to full saturation with those things yet, and besides, the OS wasn't ready for it in any case.

As a result, we had server boxes which had 8 gigabytes of memory in them, but which were still running a 32 bit environment, and so no process could exceed 4 GB of memory. It was actually a little worse than that, since the kernel needed a slice of that address space for itself, so in practice we couldn't use more than 3072 MB in the process.

This could have been a huge waste of resources: all of that memory and us only able to use less than half of it? Fortunately, there was something more which could be done. These newer systems had picked up the PAE feature, so they could instead address up to 36 bits of physical memory.

Now, the actual code running was still very much a 32 bit situation, so any one process could only address 4 GB of address space, but now we could do something sneaky: we could run multiple processes!

And so, that's what we did. Typically, we'd load two 3072 MB processes on the box, and then might cram in another gig-and-a-half process to use up the rest of the space. We didn't try to allocate the entire 8 gigs to everything in userspace since the kernel needs memory too! All of that caching, buffering, network I/O and everything else has to go somewhere. If you (try to) eat all of it with your program, something will have to lose.

This wasn't the most efficient way to use the boxes, because it meant three times the overhead of our memory-hungry process, three sets of TCP ports for monitoring and RPC methods, and a bunch of complicated job configs to keep all of this stuff running and "binpacked" as well as we could.

Later, when we got actual 64 bit setups (kernel + userspace), the PAE hacks could go away, and now we could just run a 6 or 7 GB process all by itself. The overhead dropped off, and now it was just a lean, mean, caching machine. The configuration also lost a whole bunch of special case hacks, and we could go back to the notion of "one per box" again.

I really hope nobody in 2020 is still having to do this.