Writing

Software, technology, sysadmin war stories, and more. Feed
Sunday, September 1, 2013

Policy vs implementation in source code

Imagine encountering some new code. It's in a language which you know well, but there is no limit to the way to do things. Sure, there are certain techniques which show up more than others, but really, to know what's going on, you have to run the code in your own head.

It might look like this:

if (some_strxxx_function(x, y, ...) == some_number) {
  data_[x].len = -1;
}

(The "some_strxxx_function" stuff is probably actually much more complicated than just one line. It might be several calls, with values being passed from one to the next, and with a bunch of things glued together with logical operators. It depends on how clever and/or knowledgeable the original programmer may have been at the time.)

If you actually know what all of that stuff does, you might be able to figure it out. There are no unit tests, and there are no comments (which isn't necessarily bad, since it would be a "what" comment and not a "why"), so you get to play computer and think it through.

"Okay, so this is a standard C library function that operates on a char* and it looks for foo or bar, and we're handing it these values, so it should return X when this thing happens, or Y otherwise".

Depending on how noisy and/or distracting your work area is, getting this all down may take a while. You might be taken down a bunch of weird dead-end paths and need to accumulate a bunch of mental state. Keeping this straight while dealing with other bits of real life can be challenging in a normal office environment.

Ultimately, you figure out what the code does, and then get to unravel what the code in the if-block actually does. Oh sure, it puts some value at data[x].bar, but what does that mean, really?

You get to dig around some more and find out what -1 means. It's a magic value and we all supposedly know those are evil, but here it is. It's been used, and it's been there for years, and now it's your problem. It turns out that -1 means something special here, and -1 is also used to mean something else in another context. -1 in this context means "this thing is really a directory".

Eventually you wrap your head around this, and now you know both what it means and when that gets set. Hopefully, by now, the "why" is evident.

Imagine how it might have been if the code looked more like this.

if (EndsWith(x, "/")) {
  data_[x].SetDirectory();
}

"EndsWith" probably does the same weird str* function calls, and "SetDirectory" is probably setting "len" to -1, but it reads completely differently, doesn't it?

All I did was move the actual implementation into some other location and left this caller as a collection of policy: when a string ends a certain way, we go flip a flag in a labeled way.

So enough theory. How about some code that's closer to reality.

string do_stuff(const char* input) {
  assert(input);
 
  string temp = input;
  unsigned int globallen = global_string.size();
  if (len > 0 && temp.size() > globallen &&
      strncmp(temp.data(), global_string.c_str(), globallen) == 0) {
    temp = temp.substr(globallen);
  }
  if (!temp.empty() && temp[temp.length() - 1] == '/'] {
    temp = temp.substr(0, temp.length() - 1);
  }
  return temp;
}

Context: this is C++ code, despite the "const char*" and str* function use all over the place - note the "string".

Do you have any idea what this is doing?

Put it this way - would it be easier to follow the general policy if this function used stuff like "starts_with", "ends_with", and "erase"?

I think so.

If this code winds up being a hot spot because it gets called thousands of times per second, okay, fine. Unroll it again and go crazy with your optimizations and see if you can beat the compiler. Then document the hell out of why you did it so nobody tries to roll it back up again.

Until then, be kind to your future maintenance programmers.