Software, technology, sysadmin war stories, and more. Feed
Thursday, December 15, 2011

Object oriented programming by pragmatic accident

It seems like there's always some "new" technology that makes certain people completely irrational like they were rabid dogs. They almost foam at the mouth over things just because they are new and not because of any particular use case.

For me, back in the early-mid '90s, that thing was object-oriented programming, or OOP. I wasn't one of the foaming masses, though. I was actually looking at it with suspicion because of all of that noise. The whole mess seemed ridiculous to me and I vowed to stay away from it. In my case, that just meant avoiding C++ and the loopier parts of Turbo Pascal. I had plain old C and was heading away from Turbo Pascal as a result of moving away from DOS, so that worked out just fine.

Imagine my surprise years later when I looked back on some of my code from back then and realized it was actually using a primitive sense of objects. It had happened completely organically as a matter of solving some problem and was not part of any mission to be buzzword-compliant or just "shiny". Here's how it evolved.

I had written a parser for certain kinds of text files. Originally, it was pretty obnoxious. It did simple strncmp() calls to look for keywords which had to start at the very beginning of a line. It also was extremely rigid in terms of other constraints, like only allowing exactly one space between a keyword and its argument.

Worse, since it used strncmp(), I had to do evil hacks for similar words. That is, "FOO" and "FOOBAR" had to be handled in a specific order so I wouldn't pick up the "FOO" in "FOOBAR". It was a mess.

Later, this code evolved to something which would take the char array and would create a sequence of char* pointers into it. It would take the input line like "FOOBAR 123 456" and would drop \0 characters into it to split it into different C-style strings. You'd wind up with a[0] pointing at "FOOBAR", a[1] pointing at "123" and a[2] pointing at "456". This destroyed the original string, so you couldn't use it for debugging purposes later. It was still strict about spaces, but at least it didn't have the substring matching issue.

Next, the code changed again to use a state machine. This let it ignore things like multiple spaces between arguments, and finally your input files could start looking friendly for humans. If you wanted to push things around to add comments or otherwise align columns, now you could.

This one had its own limitations, though: it had a strict requirement on the number of "chunks" it would extract from a line. That is, you could have "FOOBAR 123 456" but you couldn't also have "FOOBAR 123 456 789". It had no way to deal with that fourth chunk (789) just showing up out of nowhere. This was less than ideal for where I needed to go with my code.

The version after that twisted things around to allow for dynamic chunk handling. This version needed to keep a bunch of local variables as state, so it operated in a "callback" sense. It was pretty amateur stuff on my part, though, so it would always invoke a function with a given name -- it didn't take a function pointer. You had to make sure you linked in a function with that name and the right signature to receive the args from my parser.

While this worked well enough, it was a pain. Anything you did by way of that parser being trapped "behind" it due to the callback scheme. There were ways to work around it, but they were annoying. I also needed more flexibility for where things were going.

I finally understood the problem: I needed something which was flexible in terms of digesting characters and building up argument lists. It needed to use a state machine instead of simple strtok() type stuff, and it needed some place to store its temporary data while working on things.

Finally, I needed the option of having multiple instances of this thing going at the same time. It was now going to be used to parse multiple input sources in parallel, and they had to all keep their own state without walking on each other. static variables would not do, in other words.

My solution was to create a context buffer. This would be initialized before you did anything else, and then you'd pass it in to all subsequent calls. It looked a little like this:

libctx ctx;
lib_file_start(&ctx, "/path/to/data/file");
while (lib_file_next(&ctx)) {
  parse_something(lib_keyword(&ctx), lib_args(&ctx));

See the pattern yet? Every single call to that thing was handed the pointer to my context buffer. The actual code in those functions was identical, but all of the variables containing the state came from that "ctx" area. This meant I could have multiple "ctx" variables and the actual lib_* functions would be perfectly happy with them.

So now let's jump back to the present day and C++. You write a class. It has some functions which do work. It also has some variables which keep track of things it needs to remember. You can have multiple instances of that class, and while they'll all execute the same code, the variables the code sees are the ones associated with that instance. Other instances are off the radar, so to speak.

In other words, it's the same thing I had with my ctx buffer.

It might look like this (and don't be too critical with my return types here... it's only for demonstrating this mapping):

class Lib {
  virtual ~Lib();
  void Init();
  void FileStart(const string& fn);
  bool FileNext();
  void End();
  string Keyword();
  vector<string>* Args();

private: int state_num_; // ... other stuff ... };

So, if you're familiar with the "this" pointer in C++ objects, you can kind of see that it's essentially that same "ctx" thing being passed to you every time. The difference is that while in my C code, I'd have to refer to "ctx->state_num", in a C++ class, you can just use "state_num_" for the same effect.

I consider this pragmatism in action. The fact that it just happened to line up with some named scheme afterward is just a coincidence.

Imagine the mess I would have had if I had chosen one or more buzzwords (or "design patterns") and then had tried to create them instead of going for an actual solution based on my needs. It might have looked a lot like... well, a lot of code you see today.

December 16, 2011: This post has an update.