Software, technology, sysadmin war stories, and more. Feed
Thursday, February 23, 2012

Freezers as an analogy for hash table behavior

While I was out of town on vacation recently, there was a cluster of advisories about malicious HTTP GET and POST requests. In short, crafting the right sort of request and directing it at certain web frameworks would cause them to suck down tons of CPU time and sometimes even blow up. For those who saw this, it was the whole thing where you could send them a sequence of values which would all hash to the same bucket and thus create a giant ad hoc linked list.

Somehow, this came up in conversation with friends who have the privileged position of not having to know about all of this web stuff. They wanted to know what it meant and why things could blow up that way. I couldn't just launch into the whole hash table thing, since they had no context for that. Instead, I came up with an analogy involving freezers.

If you've been to one of those warehouse clubs like Sam's or Costco (or PACE, back in the day - remember them?), chances are good that you've seen their huge freezers. They're tens of feet long, and there are usually several in any given store. They hold a lot of stuff.

I asked them to imagine life as a simple brainless robot. You'd be given a task: go find the location of the 10 oz bags of frozen peas. As a simple robot, you could only move back and forth inside the freezer and scan whatever was right in front of you. If it wasn't what you wanted, you'd just move down some more and try again. You'd keep doing this until you found those peas, and then you'd report back with your location.

Imagine what happens if you just have one big freezer. That robot is going to have to check a lot of products before it finds the one it's looking for. If it always starts from one end and works towards the other, it's going to get a real workout, and it's going to be seriously slow.

So, some bright spark decided to set up multiple smaller freezers, each with its own robot. Now, instead of traversing this massive expanse, it has a much smaller space to consider. There are also few products to check, since they are spread out more-or-less evenly across all of the freezers.

How do they split up the products? Oh, well, that's easy. Someone made a calculator which just did some spooky numerology stuff based on the letters in the product name, and somehow reduced it to a freezer number, like 1 through 10. If you're looking for a box of freeze-dried honey badgers, that might be freezer 7 while the frozen peas are in freezer 4.

For a while, this works well, but one day, something changes. The store starts carrying a new line of products which has some really bizarre names. It looks like whoever named the products was into the same kind of freaky numerology as the person who designed the calculator. You guessed this because all of the products gave the same result: freezer 10.

The calculator has always been right in the past, so you follow its instructions and load all of the new stuff into freezer 10. Soon, the poor robot in that freezer is overloaded with the work of comparing all of those products, while the other 9 robots just sit there doing nothing.

What happened? Well, someone found out how things really work in your internal systems and decided to play a prank on you with all of the funny names. They knew you'd follow the calculator's advice even if it looked like the wrong thing to do, and so you overloaded freezer 10.

This is basically what happened with the vulnerable implementations. Their hash calculations could be predicted by outsiders, so those same people could feed it a stream of names which would always resolve down to the same storage spot. Just like in my analogy, the vulnerable systems always followed the advice of the calculator and wound up creating a huge mess as a result.

Analogies: they're what's for dinner time conversation.