Writing

Feed Software, technology, sysadmin war stories, and more.

Friday, May 18, 2012

Abusing sockets with SO_REUSEADDR set in the old days

There once was a time when I ran a pair of IRC servers for myself and a couple of friends. Around that same time, I also had a low-end machine which was used for friends to have a shell account. Most of them just used those accounts to chat, but they also ran a few small web pages and things like that. It was the late '90s, and that's just how it all worked.

So, one day, I was just hanging out on the server in a channel with someone, and then someone popped up just as the first friend signed off. This seemed bizarre. They said something like "oh, I just missed (person #1), huh?", and then quit. As soon as he disappeared, that first person came back on. The timing got my attention.

I jumped out to that box and ran the same IRC client program and my friend's connection died again. He had a heavily customized client and seemed adamant that it couldn't be the problem, so we suspected either ircd (the server) or something crazy in the OS. Looking at ircd, we had plenty of connections allowed in that particular "connection class", and besides, the server wasn't throwing errors. Their connections were just falling over dead: "connection reset by peer".

I tried to make another connection from that same client system to the same server using different software, but nothing happened. It was only when I'd run that same IRC binary that it would go crazy. This was happening across different user accounts (and there was no suid action involved), so it made no sense.

Coincidentally, I had been compiling a new build of the Linux kernel for another reason on one of my machines, and so I decided to give this thing another shot after a quick reboot. Afterward, not surprisingly, it happened again. So, my friend jumped out to a different server, and then I started up that same IRC client and connected to that same server, and once again, his connection died.

Now things were getting really crazy. While the first chat server was running Linux, the second was BSD/OS! That basically excluded the server OS as a culprit. Something else had to be going on. It was time to see if ircd was somehow culpable, so I stood up netcat listening on some random port and connected a client to it. So far, so good.

Then I started another client, connecting to the same port... and the first client's connection died. Uh oh. It's not the OS and it's not even the server program? What now?

Right about here is when I noticed that this guy's IRC client always bound to ports 2567 or 2568 for its outgoing connections. I started digging around in the source and discovered that while it might set sin_addr, it doesn't do anything with sin_port, and it just inherits whatever random garbage might be in that struct. That gets passed to bind(), and that gives some crazy effects.

I told him about my findings, but he refused to admit that it might be his broken client. It was a thing of pride to run that client instead of "plain old ircII", and to him, it could do no wrong. Instead, he asked me what was keeping him from attacking random other connections from other users. I didn't know, so I started up netcat using the precise quad (ip, port, ip, port) of his connection, and he died! So now we knew it wasn't anything specific in the IRC client which was deliberately being malicious. It could be triggered by other tools, too.

I brought up a connection with telnet (the program, not the service) and tried the same trick with netcat. It failed. nc complained about not being able to bind to the port. So then I turned it around, and connected netcat to ircd, then started a second, and the first died.

Somehow, while doing this, I noticed that some programs were setting SO_REUSEADDR, while others were not. So, I grabbed the source for his IRC client, cleared that out, and tried it again. Now I had a connection which couldn't be killed merely by duplicating it.

He still refused to accept it and act on it, so I did one final thing. I wrote a small proof of concept which would run netstat to find active connections and would parse out the ip:port:ip:port quads. Then it would deliberately bind to the appropriate address and port and then try to connect to the other address and port. If the original socket had been set up with SO_REUSEADDR, it would kill it nicely.

Ultimately, it turned out that the base ircII code had been setting this for a long time. It was just some quirk of this particular forked client which made it always land on one of two local ports and then exposed the problem.

SO_REUSEADDR has its uses, but it just doesn't make sense when you're acting as a client and really don't care about which port you get locally. If you're a server and want to restart quickly even if old connections are hanging around, then it might be useful. Otherwise, you're asking for trouble.

One note: this was many years ago, and it's entirely possible that it's no longer possible to do such things. Trying to do it right now is giving me an EADDRNOTAVAIL on connect(). That seems new to me.