Software, technology, sysadmin war stories, and more. Feed
Thursday, July 28, 2011

BBS lockups, and what I used to do about them

I used to put up with a lot of badness from software. One particular example was the combination of DOS, DESQview (used for rudimentary multitasking, back when that was a big deal), and the collection of programs that constituted my BBS. It had a way of getting stuck when doing one thing or the other, and it would require manual intervention before it would get going again.

This really started annoying me when I would go on vacation and it would lock up while I was gone. Sometimes, I'd be looking forward to showing off my system to relatives or friends in some distant city, only to find it ringing forever with no answer. I'd come home to find it just sitting there, frozen in some dumb script or other program. One poke of the reset button would "fix" things, but only for a little while.

I finally got tired of this and started resorting to some drastic measures. I knew we had another family holiday coming up and did not want this to happen again. While I had no way to do any sort of watchdog timer back in those days, there was just enough technology in our house to let me cut my losses.

We had already set up these Radio Shack "Plug n' Power" branded modules (which was their label for X-10) to make our house look occupied while we were gone. It was just a small step to buy one more appliance module and park my BBS machine behind it. The only thing left was to make sure it didn't make things worse.

Back in those days, you had to worry about dropping power during a write to the disk, or not parking the heads on your disks, and all of these other crazy things. I "solved" that with some crazy coordinated timing. At midnight, the BBS would busy out its line by taking the modem off-hook. Then it would do whatever maintenance stuff it needed to do, and then it would park the disks and just sit there in a loop for 30 minutes. The trick was that it never expected to reach the end of that loop.

Over on our lighting controller, I had added two commands. At 12:15 AM, it would turn my machine off. Then it would power it back up at 12:18 AM. It would do this every single day whether the BBS needed it or not. That way, the worst thing that would happen is that it would stay down for a day, but it would be back up the next morning.

The really sick thing about this story is that it actually worked.

Years later, upon thinking back to this, I realized that I could have saved a lot of stress on my hardware by wiring something across the reset button. They made these things called "X-10 universal modules" which were basically relays you could flip with the same lighting controller mentioned above. It would have been trivial to have it virtually poke that button on some schedule.

So, now, let's break the fourth wall.

By this point in the story, are you thinking about better ways to reboot the box, or have you realized that I shouldn't have needed to do that in the first place? I see far too many people get stuck in the moment, buried deep in the wrong problem, trying to optimize their "reboot" when they should be working on why the system locks up in the first place.

Do you see the forest for the trees? Can you?