Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, July 13, 2011

Sector editing revived to solve web hosting problems

Way back in the Commodore 64 BBS scene, people used to write things like BBS software in BASIC and then compile it. This would make it practically impossible to see the true source code, but we could still get at all of those tasty strings. It became possible to change those strings to make programs say something different as long as you could fit within the same space. We used to call it "sector editing".

Time might have passed, but those techniques are still useful. I got to use them again more recently in a web hosting support situation. This was one of those problems where people tried a bunch of things which should have worked and kept failing. I was asked to dig more deeply.

Our customer's problem came from an interaction between a pair of things they wanted to use together. First, they had Plesk, which is built to let people run virtual web hosting companies. It has a big database of settings and writes out Apache config files, including this one top-level file which gets included called httpd.include. You couldn't edit it since it would just be rewritten shortly thereafter.

Their second product was Urchin, specifically the pre-Google version of it which actually ran on the local server logs. In order for it to do certain kinds of analytics reporting, you had to rig your pages to set up cookies, and then you had to configure Apache to log them. This usually meant a LogFormat directive to say "hey Apache, make all of these log entries include the cookies, too".

That's where it got complicated. Plesk would set its own directives for logging, and due to the way the order of things worked out, no amount of CustomLog or LogFormat mangling up in the top-level httpd.conf (which was safe) would override the stuff it did in httpd.include and below (which were effectively dynamic). It was possible to set up logging to another file, but that was complicated and messy. They just wanted it to go to the usual access_log and Just Work.

So I determined that Plesk's httpd.include generator had the whole LogFormat thing hard-coded into it, and sure enough, it was visible with 'strings'. I just had to twiddle it enough to not override the definition the individual sites would later use. Time for some perl -- probably the only thing I use it for:

perl -pi -e "s/combined/c0mbined/g" binary_name

With that change in place, I ran it again, and it generated a file with "c0mbined" instead of "combined" for its LogFormat directive. That let my Urchin-friendly LogFormat with cookies stay intact, and the logs were the right format.

Trouble is, this would die upon the next Plesk upgrade, and there was no way to keep it in place. We had no way to maintain this kind of one-off hack, and it would just lead to trouble down the road.

Having shown other techs that it was possible, we all decided to leave it in the realm of "technically possible but not worth supporting", and advised the customer. Sometimes that's the only way to go.