Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, August 31, 2011

Unit tests with ridiculously bad dependencies

Let's talk about unit testing, and how they can turn into their own little code disasters. When you see insanity in a code base, you assume it's in the actual program, and probably don't think it could creep into the tests. Well, if you're lucky, perhaps that will be the case. Otherwise, odds are, whoever wrote the main code also wrote the tests, so you may have to deal with some serious badness.

Here's an example. Let's say you have a piece of code which tries to call out to your "master controller" for a given area. In real life, it'll try to connect to the replicas in Houston, Colorado Springs, and Salt Lake City. Odds are, you don't want your test to touch production, so what can you do about it?

The insane way to avoid connecting to a real replica is to point it at a name which does not exist. Instead of targeting HOU, COS and SLC, give it XYZ. I've seen this happen far too many times. This has all kinds of bad side-effects.

First, whenever you run this, it's going to generate a bunch of DNS lookups, or whatever you use to resolve things. Odds are, they will either fail or time out, but it's adding a lot of weight to your test.

Second, it will eventually misbehave on you. What happens when a real location called XYZ is added? Or, alternatively, what if your DNS information changes (like adding a wildcard), and it starts resolving to something?

When this sort of thing happens, it tends to appear by way of a continuous build and testing environment. You can't trace it down to any given change in your code, since it was something in its environment which affected it. Less-clued developers will blow a bunch of time trying to bisect the change history to nail down a particular patch, revision, changelist, or whatever. Having external dependencies like this is a recipe for stress.

I've seen all of this and more in the same project. There were connections which purposely targeted locations which did not exist any more to "see what happens when something goes down". There were also other connections which purposely targeted locations which *did* exist, or at least, they did when it was written.

I became aware of these because things changed. A previously-dead location name came back to life, and another one which had been alive went away. I wound up injecting a mock DNS resolver class which allowed me to fake responses that would always give me what I needed. This stopped the insanity of having it talk to the real world.

There was another gain, too: all of that DNS gunk is slow. These tests went from a runtime of tens of seconds down to a couple hundred milliseconds. Ridiculous speedups like 100X or more sometimes happened as a result of this change.

When you can get that kind of win from a simple tweak, you can be sure that the original code was bad.