Writing

Feed Software, technology, sysadmin war stories, and more.

Thursday, January 24, 2013

Let's learn from my old code by revising it

Yesterday, I reached into my bottomless pit of a home directory and pulled out some of the cruftiest of the C code I have ever written. I said it was bad but didn't bother going into why. Today, I'll talk a little about what that code was supposed to do and then show how I'd handle the same job today.

The main block of insanity yesterday showed a strcmp() call which was looking for "EXEC" in something called execstr:

if (!strcmp (execstr, "EXEC")) {

There are obvious differences in my style from 1996 to now. First, if I really had to deal with C char* buffers (aka strings) and absolutely had to use strcmp, it would probably look more like this:

if (strcmp(execstr, "EXEC") == 0) {

It's a subtle difference, but I think it's easier to read. The "!strcmp" always makes me think "not-string-compare" when it's actually just a quick way to test for zero, and it can be confusing at a glance.

That's if I was going to use C. Odds are, I wouldn't. If I was doing this today, it would be in C++, and I'd be using an actual string implementation which would let me do something like this:

if (execstr == "EXEC") {

... or even this ...

static const char* kExecCommand = "EXEC";
 
[...]
 
if (execstr == kExecCommand) {

... to move a magic literal out of the code.

This, too, is better, but it's also missing the point. To get closer to that, we need to take a few steps back. What is this code really trying to do?

Well, the original code was parsing a config file, and it was looking for things it should do when it sees certain changes happen on the power line. For instance, when it sees someone send an "ON" command to the device with address "C 2", I might want it to play a MP3 of a doorbell. It looked a bit like this:

ON C 2 1 EXEC mpg123 ~/doorbell.mp3

This config file was parsed and stored in memory in a bunch of little buckets organized by address, so when something came in for "C 2", it could look through, see if anything matched the new state ("OFF" was 0, "ON" was 1), and if so, do whatever it found.

So that's the actual problem. I want to specify some rules and have them get stored in memory sanely. If I had to do this again from scratch right now, I certainly wouldn't write my own config file format or parser. I'd leave that up to some other library which would do most of the ugly grunt work for me.

As it turns out, Protocol Buffers make excellent config files when expressed in ASCII format. Someone told me this tip a long time ago and I don't know if I ever thanked him for it. It makes so many things so easy to do. To get started, I'd need to craft a message format for the config file. First, let's make a way to specify which address and event (OFF or ON) I'm talking about.

message Config {
  message EventHandler {
    enum HouseCode {
      A = 0;
      B = 1;
      C = 2;
      // ... and so on ...
    }
 
    optional HouseCode house_code = 1;
    optional uint32 unit_code = 2;
 
    enum PowerEvent {
      OFF = 0;
      ON = 1;
    }
 
    optional PowerEvent power_event = 3;
    // Note #1
  }
 
  repeated EventHandler event_handler = 1;
}

If you're not familiar with the protobuf language, this might not make sense at first, but look at the kind of config file the above definition lets you have:

event_handler {
  house_code: C
  unit_code: 2
  power_event: ON
}
 
event_handler {
  house_code: C
  unit_code: 9
  power_event: OFF
}

Now I need to have some way to tell it about the external commands (like playing the doorbell MP3) I want it to run when events happen. For that, I just add one more piece right where it says "Note #1".

    repeated bytes run_command = 4;

With that in place, my config file can now look like this:

event_handler {
  house_code: C
  unit_code: 2
  power_event: ON
 
  run_command: "mpg123 ~/doorbell.mp3"
  run_command: "echo Doorbell pressed | mail pager@example.com"
}
 
event_handler {
  house_code: C
  unit_code: 9
  power_event: OFF
 
  run_command: "mpg123 ~/garagedoor.mp3"
}

See that? Because I used "repeated", I can actually stack in multiple commands and they'll all become part of the data structure once the config file is read by my program.

(Here's the doorbell-pager hookup story, in case this sounded familiar to long-time readers.)

Now the question is: what does this look like in terms of code? Let's write a quick function which looks at the contents of that event handler. It's being called because "C 2" just turned ON, and it needs to do all of the run_command items.

void ProcessHandler(const Config_EventHandler& handler) {
  for (unsigned int i = 0; i < handler.run_command_size(); ++i) {
    RunCommand(handler.run_command(i));
  }
}
 
void RunCommand(const string& command) {
  // system()... or popen() ... or whatever might apply ...
}

As you can see, all of the raw pointer mangling and other stuff is gone. It's actually rather boring now. Getting to the data is just a matter of taking a leisurely walk through the data structure, and using it is similarly dull.

Dull is actually good. It means there are no sharp edges. It's been worn down to something relatively safe which just works.

All of this just shows how the data might be used once it's been loaded into the program. The actual code excerpt from yesterday was about loading it, not using it. Here's what the equivalent code to load this ASCII protobuf might look like now:

  string raw;
  if (!ReadFileToString("config.file", &raw)) {
    printf("Can't read your config file.\n");
    return false;
  }
 
  Config conf;
  if (!google::protobuf::TextFormat::ParseFromString(raw, &conf)) {
    printf("Can't parse your config file.\n");
    return false;
  }

For the curious, ReadFileToString is a boring little utility function I wrote which amounts to this: open a file for reading, then stick every byte from that file into a C++ string (via a pointer you were given), and return true if you succeed, else return false.

ParseFromString is another utility function, but this one lives within the actual protobuf libs, and its only purpose in life is to turn the contents of that file into an actual usable data structure.

If your program makes it past those two calls, you now have this thing called "conf" which holds all of your configuration data. You can keep it in that form and refer to it when needed, or you can flip through it and use it to create structures and objects and populate settings in your program. The possibilities are endless.

Isn't that so much better than my original C code?

...

If you enjoyed this and want more of this kind of learning, you might be interested in signing up for my "let's clone a popular web site" project. In it, I'll build something from (nearly) scratch in order to show how I approach certain programming tasks. If you've ever wondered what one of these sites looks like when it's still in "larval form", this would be an excellent way to find out.

Send me a note if you're interested. Thanks!