Software, technology, sysadmin war stories, and more. Feed
Tuesday, October 4, 2011

Rescuing birthday parties from file corruption monsters

About fifteen years ago, I was hanging out on IRC and a friend needed help with a corrupt data problem. I was feeling relatively bored and was looking for something to do, so I decided to poke my nose into that conversation to see if I could help. It turned out well.

Somewhere on the east coast, there was a place in a mall where kids go to have parties. They had pizza, there's miniature golf, "lazer" tag, soda, rides, and that sort of thing. I imagine there were arcade games and pinball, too. Anyway, they apparently did most of their booking in this program called Deskmate on a creaky old Tandy laptop.

One day, it stopped working, and all of that data disappeared. All of their booking details had vanished. They had no idea who would be coming into the store, whether they had paid in advance or not, what food and what games to set up ahead of time, and so on. It was a complete disaster and they were freaking out.

I asked for a copy of the file. Running "strings" on it gave a fair amount of text data which was plainly visible. You could see things like "$7.95 PACKAGE PLUS PIZZA & JUICE FOR $7 + TAX", but that's all. The times and dates were buried in some binary format and were unavailable. Without those, they had no way to know which ones had already happened and which ones were coming up.

Needless to say, there was no file format documentation. It was a crusty old system even then, and this one was going to call for some serious reverse-engineering. I started by having my friend create a new database file in the system. He then entered a bunch of records and gave them known values which I requested: different dates, different times (hours and minutes), then he sent that file to me.

Don't ask me how I figured it out, but somehow it started making sense. I wrote a small program in Turbo Pascal (!) to grovel around in that file and turn that mess back into usable data. He was happy, his boss was super excited, and I got to feel good by helping someone. Plus, their business continued unabated and no kids had their birthdays ruined. It was a good situation all the way around.

I actually still have the source code from back then. My code from that night is terrible, but staring at it now seems to reveal a few things. First, the year is just the first byte - 17, the month is the second byte - 97, and the day is the third, also - 97. I'm not sure what's going on with the 17 offset, but the 97 thing means they've basically shoved the month and day into the lowercase letter area of ASCII.

At byte 8, you have your start time ORed with 128, and byte 9 is the stop time, also ORed with 128. Given that each one is just a byte, further decoding is needed to make it human-readable. They're in units of 15 minutes. If you slice that into an integer part and a fractional part, it's just "(int * 4):(frac * 60)".

After that, you just start reading characters for the description. It terminates with a ^A -- 0x01. Three bytes past this, a new record starts, and you do it all over again. You keep doing this until you hit EOF.

It's a little more complicated than this because there are weird unexplained gaps between data in the file, but you can get through that by just having it attempt to decode everything and throw out stuff with bogus dates. Given that it can re-sync on a ^A, it's not too hard to get the actual data out.

Hopefully nobody ever has to try to extract data from a Tandy Deskmate CLN file, but if they do, this might just help a little.