Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, January 4, 2012

Outgrowing local storage and moving to MySQL

Last time, I talked about the very beginning of my "mailserv" project. Picking up from there, I want to clarify one bit about this: it was written in plain old C. It didn't have the luxury of the STL that I would use now in C++. That means these linked lists and hash tables and binary trees were happening the hard way: I wrote it myself. All of this data was living right there in my process.

I had just improved the "dump" scheme so that it took a fraction of a second to write things to disk. This was great, but it didn't help startup times. When the program started up, it had to bring all of this stuff back in from a glorified text file. It had since grown to about 14,000 entries.

Now, this file with its 14K entries was only consuming about 1 MB of disk space. That was small enough to fit in the disk cache during my restart/reload tests. Clearly, my code was CPU-bound and something needed to be done.

Did I mention the machine in question was a Pentium 150 with 128 megabytes of memory? It was also running BIND and Squid in addition to sendmail. Yeah. Life was interesting that way. It was the same system which starred in my life without BGP post from a couple of months ago, for those who remember it.

While I pondered ways to clean this up, some other dumb things happened. It turned out that Lotus Groupwise treated 451 as a permanent failure, and we had been bouncing mails from a particular state agency. I added a config file directive which meant "never delay mail from this host" for them, and the problem went away. Their lack of SMTP compliance was not worth fighting over.

Meanwhile, things got slower and slower. The list of quads grew to 25,000 entries, or about 2.7 MB on disk. Startup time had stretched to 17 seconds, and the process was now pushing 8 MB of resident memory. It kept going. It got up to 34,000 entries, 3.8 MB on disk, and about 10 MB resident. It was now taking nearly a minute to start.

The problem was that my server was dead to the world during this interval, and my milter would (properly) fail open during it. That means my mail exchangers would allow garbage mail in because the alternative would have been far worse: denying all mail until my experimental daemon came back up.

I decided to write "partial load support". It would work by setting a "loading" flag at startup. It would then read the config file to get its access controls and then start listening to the network on its TCP port. It would then start fetching data from that file.

If a request came in while that flag was set, I'd throw back a temp failure unless it had been marked as "nodelay" in the config file. I'd store nothing about the request.

Otherwise, I'd keep loading data. Once that was done, it would clear the loading flag and operations would continue as usual. I think this code was written in such a way that it set a zero timeval to select() in loading mode so it would drop straight through if there was no network activity going on. Then it would read a chunk of the file, parse it, then go back to select().

Normally you wouldn't use select() with a zero timeout due to the CPU load, but in this situation, it actually made sense. As soon as loading finished, it went back to the usual long timeval thing to not melt my poor little P150 down. This was a stupid hack on my part, but it did work!

Next, I started thinking about shrinking this mess. Many quads were absolute garbage and would never show up again. I decided to add "atime" to each entry in the style of Unix filesystems. Accessing an entry would update that time. Not long after that, I wrote something which would expire quads after being idle for long enough.

This one was fun. It waited until select() dropped out with no activity, then it dropped into a hash table bucket and started checking the times on those entries. Then it just incremented the hash table bucket pointer and went back to sleep. This way, it would check one bucket per pass. It was a slow process, but this kind of housekeeping doesn't need to be quick.

I was thinking about new ideas for how to store the data, like using the filesystem as a database. I figured this expiration stuff would buy me enough time to go and develop it. The idea was to create a directory tree like quads/192/168/123/45 with the helo/from/to data under that. That way, the kernel itself would get to figure out caching and all of this through normal filesystem accesses.

Finally, I realized all of this was insanity. I had never done anything with a "real" database before, but I could no longer ignore it. I was spending far too much time dealing with data storage problems, I wanted an escape. It was time to move to MySQL. My original design was the first thing I'd ever done with that sort of system, and it was wonderfully naive.

I figured that each quad would become a row. A row would have a unique ID which was automatically provided by the database. There would be an IP column, a HELO column, a FROM column, and a TO column. Then there would be the creation time and access time. I also added a pair of counters: one for references and one for when it actually generated an OK.

I decided to start slowly. Instead of moving the quad storage first, I put the quarantines out there. These were just IPs, creation times and some notes for my own purposes. The existing interface was sufficiently abstracted such that I was able to shunt those lookup and creation calls to new functions which talked to MySQL instead. This worked well, so I proceeded to quad conversion.

The quad code looked easy. I'd just write a query like this:

SELECT (whatever) FROM quads WHERE mip = "..." AND mhelo = "..." AND mfrom = "..." AND mto = "..."

Easy, right? Well, sure. That's how I got started. It worked, but it seemed slow. Considering how quickly it had run before, this made no sense to me.

Unfortunately, my lack of knowledge about SQL at the time and database design in general came back to bite me. I'll get into what happened in the next post.


January 5, 2012: This post has an update.