Writing

Feed Software, technology, sysadmin war stories, and more.

Saturday, July 23, 2011

Data exchange methods seemingly optimized for pain

One thing I have never been able to figure out is why some people keep insisting on pushing things into human-readable formats when the data will only ever be consumed by another program. It winds up creating really nasty problems down the road, and it pollutes the minds of many who have to work on it. They frequently wind up losing the ability to see the forest for the trees, as the saying goes, and their resulting design and code shows it.

An example which comes to mind is a testing suite I encountered not too long ago. It had lots of nesting going on, where a test could start another test which then started sub-processes, and all of this. The way it kept track of everything was to print stuff to a file, one line at a time. It was a little like a demented syslog file at a glance, but it went far beyond that when you dug into it.

I guess the "whitespace is meaningful" philosophy of Python from the code itself got into the design of this thing, since it would tell the sections apart by indenting them. Yep. It would actually push in things by 0, 2, 4, or however many number of spaces as things were going on.

BEGIN test1
  BEGIN test2
    some_result
    BEGIN test3
      did_something
    ABORT
  END
END

I learned about this one day when overhearing that someone was having to retool the parser to deal with something or other. That's when I snapped to attention: "parser"? What are we parsing? What is the format? And where is it coming from?

The answers were: we're parsing status log output, it has its own strange format, and we're the ones generating it. So, yep, we had problems dealing with the things we were generating. Lovely!

Now, for the record, I've created several ASCII-based network protocols which looked like everything else out there: SMTP, HTTP, etc. You could telnet in and talk to the daemon and all of this gunk. I'll just rationalize that by saying it was a long time ago, and debug-by-telnet was pretty useful at times. However, if I had to do it today, it would probably look very different because there are better tools and I've learned how to use them.

Speaking of tools, these guys were sitting on top of a gold mine of libraries which would happily batch up any data you wanted and would ship it around for you. There were entire groups of people inside the company who were taking care of that kind of stuff. Also, better still, everything else ran over top of it. You could be sure that whatever you'd use from that pile of libraries was getting plenty of real-world testing, thousands of times per second, the world over.

But no. They had to keep inventing things to go from binary to plain text to binary again, even though nobody ever read the plain text until the parsing broke (or was insufficient). Instead of making it Just Work and removing the need to ever grovel through those stupid text files, they just kept piling on more hacks.

No amount of prodding from me ever got that high-centered truck off the concrete median, so it just kept spinning its wheels. I gave up after two years of this and moved on to something else.