Writing

Feed Software, technology, sysadmin war stories, and more.

Thursday, May 31, 2012

I created a software monster

I have perpetrated some awful hacks in my time. Some of them escaped my grasp and are probably still out there making someone miserable right now. Here's a story of yet another one from my school district days.

One year, the school district decided to hire a webmaster. The first person they got for this job was quite a character. He decided (among other things) that our main page should have a world-writable guest book, and a bunch of other useless things. I later found his resume online and realized that all of that "useless" stuff had turned into bullet points, and was probably used to score another job when he moved on less than a year later.

At any rate, by virtue of building the Unix boxes some years earlier, I had also put up a number of very simple web pages. I've always been big on content and light on style when it comes to such things, so they were basic. There was also a tree of directories under /schools for each of the district's locations. Each one had a simple placeholder in preparation for some day when someone would come along and take ownership of it.

This went on for a while, but then that new webmaster came up with something brilliant: let's install IIS on a NT machine and serve the school web pages over remote mounts! Yes, instead of having the web pages physically live on a box nearer the edge, it actually had to slurp the contents "live" over a fractional T1 from that school.

Realize that the "network engineers" were not applying patches, so their machines were vulnerable to all of the new "pup tools" which were going around: winnuke, teardrop, syndrop, land, nestea, bonk, and so on. This would knock their machines offline randomly, and not just when I was demonstrating something for them.

Some months after this, the first webmaster moved on, and was replaced by a much more reasonable person. He thought the IIS situation was abominable and had to go. He would much rather deal with some flavor of Unix and Apache that I was running rather than Windows and IIS from people who did not maintain their machines.

This, at last, is where the evil hack truly begins. By this point, the users were accustomed to editing their web pages as a series of local files on their school servers. It was too late to put those horses back in the barn, so we had to just roll with it.

So now, I had a decree: the web pages will now be served by Apache on one of my machines, but they will still be physically stored on the NT machines at each school.

Oh, great. Lovely.

Somehow, I had learned about Samba and smbfs in Linux during this interval, and decided to give it a shot on my workstation. It worked, and soon I had a bunch of //school-server/blahblah mounts on my machine, with virtual host document roots pointing at each. It was horrible, but it did do the job.

I was given the go-ahead to take this to "production", so I took one of my BSD/OS boxes down for the better part of a day to reinstall it as Linux so I could use smbfs on it. I figured it would be a matter of putting things in fstab, mounting them once via rc.local or similar, and that would be it.

Naturally, I was wrong yet again.

Trying to expect reliability from wonky links and wobbly servers was stupid in retrospect, but that's where I was. I wound up writing something which would run via cron every 5 minutes to stat() a file on each of the mounts. The idea was that it would keep resetting some kind of idle timer that kept disconnecting my mounts.

It was simple enough: build a list of UNC paths and local directories, then list a file that's guaranteed to exist on each. For each local directory, try to stat() that file. If it fails, unmount it, then remount it. This was usually enough to keep the link up.

Even with that, it would still fail, and when it failed, it blew up spectacularly. My poor machine would start scrolling tons of garbage in its dmesg ring buffer, and it was obviously in distress. Running "df" or anything else which touched the ailing filesystem would just block. httpd processes would get stuck. "State D" became my enemy.

What's really crazy is that I managed to find a way out of this as well. It turned out that attempting to access a dead mount would start generating SMB traffic. All you had to do was make it fail with a RST or similar. Well, if the far end NT machine was down, or if that school's T1 went out, that wasn't going to happen. How else can you create a RST?

Oh, right, you can just alias the school server's IP address onto the web server. It'll try to connect to the school, but will actually wind up talking to itself, and since it's not running anything on port 139, it'll RST. This is enough to make the mount hard-fail at last, and then it will finally let all of those stuck calls return.

At that point, you can actually umount the thing and let the "sorry" page which has been hiding "underneath" (in the mount directory) be served to visitors.

I should note this was well before the days of iptables when you could just add a rule to OUTPUT which would --reject-with tcp-reset or whatever. I had to do this the ugliest way imaginable.

Why didn't we just run Samba or similar on the Linux web server and store the files right there as local resources? Oh, that one was great. There had been a decree: "the kids (students) may be the ones maintaining these sites, and they aren't allowed to have Unix accounts". I am not exaggerating. That was the policy.

Would you believe this horrible mass of stupidity was still an improvement over IIS? As broken as they both were, my approach still worked better than that original setup.

So then, from the "fixing problems you had 15 years ago" department, what would I do about it today, given similar restrictions but with my current technology and knowledge? Just off the top of my head, we absolutely would not serve straight off a network mount. It's a ridiculous idea, and is just asking for trouble.

Somewhere, there would be SMBFS (or now CIFS, I guess) mounts of those shares, or some other kind of SMB/CIFS client action, but it wouldn't be directly in line with the web server. Instead, there would be a process (perhaps rsync) which picked up data from the cursed mount points and brought it over to local storage on the web server.

When those mounts were unavailable for whatever reason, or even if the replication had its own issues, there would still be a local copy of everything. The school web sites would remain up, even if they were slightly older than what someone may have just saved to disk on their local server.

I guess this is why people say hindsight is 20-20.