Writing

Software, technology, sysadmin war stories, and more. Feed
Friday, August 26, 2011

The Linux box with a penchant for network neighbo(u)rs

I've seen some wild and wacky stuff while working on broken systems. That's the thing about tech support: you have a heavy selection bias since you never hear about things which are working perfectly. From your perspective, everything seems to be broken all of the time. Given enough customers, there's at least something broken somewhere at any time, so this is not much of a stretch.

One oddity I saw was a machine which would become unresponsive from time to time. It would lose the ability to speak to some hosts while working to others. There was no rhyme or reason to it. Somehow, it wound up becoming my problem, so I logged in and started to investigate.

My usual initial sanity checks proved to be useful. I would run "w", then "df", and then "dmesg" and look for craziness. The first two showed nothing unusual, but then dmesg said "Neighbour table overflow". I knew this referred to the ARP table, and as confirmation, "arp -an" showed a ton of entries. It was trying to hold one entry for everything it talked to on the Internet! But why...?

This machine had a messed up network configuration, all right. It thought the whole world was right there on its local segment. That made it start blasting ARP requests for any outgoing traffic it may have had. What was really bizarre is that something was answering these requests. It was receiving the MAC address of some worthy gateway and thus those packets were making it out. But how?

A bit of poking with tcpdump showed that there were a couple of oddly-configured Windows machines on that same segment. For whatever reason, when they saw the wild ARPs, they decided to reply. I forget whether they were handing over their own addresses or that of the router (their shared default gateway), but either way, it worked.

Of course, this had the side effect of filling the ARP cache, leading Linux to scream and holler, and hosts which fell out or otherwise could not be added would wind up being unreachable. Add that to the unpredictable rate of hosts entering and leaving, and you had a roving network blackhole which made no sense.

It's been many years, so I forget exactly which kind of badness had been configured on this machine. In trying to reproduce it today, it seems that a netmask of 0.0.0.0 would do it in theory, but a Linux box running Slackware 13.37 (with the 2.6.37-6 kernel) won't auto-add the ridiculous network route when you ifconfig with that netmask. I can't dial it back beyond a netmask of 248.0.0.0 -- anything with fewer bits set doesn't yield a net route at all.

I did manage to find one way to make it happen, though: treat your Ethernet device like a point to point link, and do this: "route add default eth0". That will give you a route to 0.0.0.0/0.0.0.0 with a gateway of 0.0.0.0, and you will start blasting ARPs out for the whole world. This is probably what was set up on that customer's machine, but I can't imagine how it got there. Our kickstart wasn't that broken.

If I had to guess, "monkeys with root" would be it. It usually is.