Writing

Feed Software, technology, sysadmin war stories, and more.

Sunday, September 8, 2013

Spend the time to make it simple

Here's an observation of a system I'm sure a lot of people have said, heard, or thought at some point in time:

Wow, that's really complicated. It must have taken a long time to write.

My question is simple enough: did it need to be complicated? If the answer is "no" or "I don't know", then there's a piece missing from that observation.

It might have taken a long time, but it still didn't take long enough.

If all other things are equal, I'd prefer a simpler solution to a complicated one. Complicated solutions have a way of creating all sorts of weird effects down the road.

If you were around for the big multi-vendor, multi-product SNMP security disaster around 2002, you might remember some of this. SNMP uses this encoding called ASN.1 which is pretty big and scary. I bet a bunch of engineers took one look at that spec, decided it was just too much work to write their own implementation, and punted on it. Given there was a product (ucd/net-snmp) with a compatible license, it shouldn't be surprising that so many were built around it (officially or otherwise). This effective monoculture meant one vulnerability worked on all of them.

I'm not joking. If you never saw this, or if it's faded from your memory in the past 11 years, you need to look at the advisory report again. Notice how long the page is, all due to the list of affected products. It's nuts!

Think about this another way. Let's say you're going to build a network much like the IPv4 Internet. You want to make it so it can have millions or billions of nodes, with some as end stations and others as routers. There will be any number of paths through this network, and you're worried about routing loops.

This can happen if router A sends traffic for target X to router B, while router B sends that traffic to A. Pretty soon, you have an electronic food fight, and the pipe becomes saturated as packets check in and never check out. Even if there are useful routes for other traffic which cross that pipe, they become useless, because the pipe itself is full.

What now? Do you come up with some elaborate packet tracking system so that all routers remember every single packet they forward so they can check for duplicates later? Wouldn't that take up an enormous amount of memory and CPU time, and delay every single packet going through the system? Wouldn't it also delay all packets instead of just the ones which were genuinely in a loop? Yeah, that's bad.

Or, you know, you could do something really simple. Have the sender put a number in the packet, and every time someone passes it on, they just decrement that number. If that makes it hit zero, then they drop it and hopefully emit an error back to the sender. Packets will still get caught up in a loop, but at least that loop is finite and they will eventually "die".

This concept is simple enough to be used in a children's game. Get a bunch of kids together, and hand one a legal pad. Tell them to rip off the top sheet every time they hand it on to the next kid. If ripping it off reveals the cardboard backing underneath, then the game is over. Will programmers find a way to screw up something that simple? Oh, sure, but at least it'll be easier to spot.

Would you rather troubleshoot a decrementing counter?

"Oh, the number didn't go from 2 to 1 here." "Oh, we didn't catch when it went from 1 to 0 here."

... or would you rather hunt for bugs in a global stateful packet tracking system?

"Two different packets hashed to the same value, which was supposed to be impossible, but on this build on the software we only sample X bytes instead of the full Y bytes and so more packets appeared to be the same."

I know which one I'd rather work on.