This morning's down time brought to you by...
A desire to reduce cruft on your web server may actually make things much worse. As with most of these things, a certain sequence of events will set the stage for something breaking. This particular one bit me this morning, and it's due to some changes I made several weeks ago.
Not long after the new year, I took a look at all of the random garbage my web server was running as part of a stock install. It had modules for auth_mysql, mod_perl, mod_php, mod_python, and bunches of other things I hope to never have to run on my poor little machine.
My web server machine happens to run a distribution which uses the /etc/httpd/conf.d/* approach to add-on modules. This gives you a bunch of little files in there, each with the gunk needed to have Apache load a .so file or two, maybe twiddle some MIME types, and add handlers. I decided to turn these things off by moving those files to another directory. That way, turning them back on, if absolutely necessary, would be trivial.
That was a mistake which would set me up for failure down the road -- just this morning, in fact. Understanding what happened requires a little backtracking.
Earlier this year, a bunch of libraries and other web frameworks ran into issues with the way they store data which is received from the client. These are the arguments from POST and GET, and if you know something about how they are stored (like in a hash table), a malicious person can rig it to make life interesting for that web server.
PHP was affected by this, and so a new package went out through the usual distribution update channels right after the new year. I must have done my little httpd/conf.d cleanup immediately after that, since nothing bad happened when it was auto-applied.
The problem is that PHP's little fix introduced a nice little security hole, so they had to push out another package. This one was auto-applied by my server during a normal update run, and thus the time bomb was planted.
This morning, during normal log rotation, all of the Apache children were told to "reload" as usual, and they did so. Then they all started crashing over and over and over. It went on like this until I got back online, saw the disaster (and a heads-up from a friend) and started investigating.
Every one of my httpd children had started segfaulting upon trying to serve content. The only hits which were actually managing to succeed were the 301s and 403s, for some reason. strace showed it bombing right after doing the usual .htaccess type checks, but I couldn't get a good handle on where it was actually crashing.
Then I did something relatively stupid and stopped the whole web server (service httpd stop) so I could bring it up in single-process mode or similar. It stopped crashing. I put it back as the usual mode. It was still fine.
Oh great. Now I had a heisenbug on my hands. Trying to analyze my problem had hidden it, and it wasn't coming back.
Long-dormant neurons from my days of wrangling servers for other people in a web hosting context started waking up. I wasn't going to leave this there, since it would probably just come back to bite me again. Oh no. This one was going to be discovered and nailed to the wall as a warning to other bugs: do not trifle with me.
Somehow, I thought to look at my system's package manager stuff and specifically looked at the install dates. Only a couple of things had been installed in 2012, and only one set of packages had been installed since the last log rotation event: PHP. I still couldn't figure out how that was related, but at least now I had some idea of what could have happened.
While talking about this, I made a random comment that "I didn't even think I had PHP enabled on that box, because, well, me? PHP? No way.", and yet, there it was in /etc/httpd/conf.d: php.conf, with all of the usual gunk which makes it start up. Huh? Why did I leave that there?
I looked in my "disabled" directory where I had moved all of that junk some time before. php.conf was still there. I hadn't left a copy in conf.d where it would be active. Something else put it back. It must have been the package update!
So now it was starting to fit into place, and I came up with a hypothesis. First, bring up Apache with no php.conf directives. Second, slip one in there. Third, restart the children but not the whole thing with 'reload', not 'restart'. Fourth, wallow in the tears of a thousand segfaults.
I decided to repeat the experiment by emptying the file and bringing Apache back up from scratch. Then I put the contents back in and did a reload. It promptly started segfaulting. Then I cleared that file out, reloaded one more time, and all was well.
That's about the time I started writing this post. The world must know not to do what I have done in the name of reducing cruft, since it will just add more entropy to what should be a nice quiet Sunday morning.
Incidentally, I've since made zero-byte versions of those .conf files so my "helpful" package manager won't bring them back and make this happen again. I'm tempted to make them immutable or something else fun like that, but that would probably just break something else down the road.
As for the exact nature of the segfault and what's going on when you mix and match Apache configs between the root parent and the httpd children, I haven't bothered to dig into that yet. It's shaping up to be a nice day, and it's no time to be geeking out on the computer. The specifics of badness from dynamically loaded library files can wait.