Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, January 30, 2013

Connecting bug fixing to automated testing

I've worked on a fair number of projects, each with their own ways of doing things: different programmers, styles, languages, goals, and all of that. One thing which seems to vary greatly is how a bug report is handled. I had a chance to look at how a report had been handled on a project long after the fact, and found it had been fixed in a rather peculiar way.

The person who wound up taking the bug and writing a patch essentially added a new check to make sure nothing bad could happen. They tested for the specific bad scenario and did some magic to make it map that bad data over to good data. In doing that, it would no longer trigger the bug down the line.

What seemed strange to me was that there was no test for this "bad data" scenario. In other words, they didn't have something which purposely fed the wrong data to the program to see what would happen. Sure, at this point, it shouldn't do anything, but that's not the whole reason for why such a test should exist.

Many years from now, someone is going to make a feature request to make the program handle more variety than it had before. It's part of scope creep or "featuritis" if you like, and it happens. When they go to implement this, they're going to run smack into the "bad data to good data" mapper. It's going to map their new flavor of data into something else, and that's going to screw things up.

They'll go looking for something that's breaking their new feature, and they'll find the patch:

if ((data - kMagicValue) < kThreshold) {
  data += kThreshold;
}

(This is just a contrived example, but hopefully you get the idea.)

As far as they can tell, this is just needless clamping of the bottom end, and so they remove it. In doing that, they have just opened up the original hole somewhere else in the program.

One attempt to keep this from being casually removed would be to add a comment right above the "if" line:

// Fixes bug #123.

It might even say something about "bad data out of window" or some other short cryptic blurb about the actual problem. If we're lucky, that comment won't get corrupted and won't disappear over time and will actually be noticed by the person before they just go and throw out that patch.

If we're not lucky, the patch will be tossed and the hole will return.

Now let's imagine someone had written a test for this scenario. First, acting from the bug report, they write a quick little thing which will pass in the bad data, but they code it to expect a proper result. That is, they code it for how things should have worked all along. At first, the test fails, and that's actually the whole point.

Then they go off and write their patch as before, and submit both the patch and the new test to the tree. The bug tracking system is updated with the details and then that bug is closed. Life goes on.

Then, when that other person comes along and goes to add the feature and runs into the patch, they still remove it, but now something new happens. That test from back then starts failing. It was written to deliberately try that old hole, and when it came back, it managed to break the system yet again.

This forces the programmer to have a look at the test. The test is much less likely to change over time, so it's probably well-commented and linked to a bug number. Upon opening that bug number, the programmer can now see all of the context from before.

So, now, this person knows they have to juggle two things. One, they have to maintain some kind of protection from bad data in another part of the system. Two, they have to get their new feature working. They can now go away for a little while and think about this to arrive at a solution which handles both at the same time.

Without this kind of protection from a test, you can imagine a slow oscillation between states, where one hole is patched and another opens up. Then that second hole is patched and the first one opens up. If enough time elapses and it doesn't seem oddly familiar to people (perhaps due to programmer turnover), it may go unnoticed.

Call it "test driven development" or whatever buzzword you want to apply. I just think it's a good idea for making sure bugs stay fixed. I always feel weird when I fix a problem without writing a test. Maybe it's too difficult to test this particular thing. If it's something I designed, that always makes me reconsider my design choices to make it easier to test things in the future.

I can't say this would be appropriate for every sort of bug, but there are a bunch of weird ones which would probably benefit from a little more test coverage. Some future maintenance programmer would probably appreciate the effort.