Software, technology, sysadmin war stories, and more. Feed
Saturday, November 26, 2011

My old Apache log analyzer and firewall twiddler

I get tired of seeing the same old garbage hitting my web server. There are just so many machines which have been compromised in some way and spend their days scanning for vulnerabilities. Every so often, a new trick is found and then the list of things tested gets a little longer.

It's not such a big deal any more, but back in the days when my web server ran on a dinky little 128 kbps pipe, this stuff hurt. I decided to do something about it. This is how I handled these worthless hits.

Apache has a little trick where you can tell it to "log to a program" via the CustomLog directive. I used that to invoke something quick called "aplog" that read stdin and looked for certain keywords. At first, it just looked for "FormMail". That was enough to catch the idiots of the day. On a successful match, it just threw the IP at a little helper program which had enough privileges to invoke iptables. That host would stay blocked until I cleared out the list.

With time, my list of things to match grew. It picked up non-case-sensitive "formmail" attempts, plus NIMDA/Code Red default.ida and proxy-exploiting CONNECT methods. All of them got the same treatment: a short trip to /dev/null for all incoming packets. It worked well.

Not too long after that, the whole GNOME DAV idiocy landed. It looked like this:

x.x.x.x - - [09/Feb/2003:05:09:17 -0600] "PROPFIND /foo/ HTTP/1.0" 404 702 "-" "gnome-vfs/2.0.2"

I have no idea why they were all doing that. It was obnoxious. I was not their WebDAV server and never would be. Whatever kind of garbage was generating those requests needed to be punished, so I fed it to my log checker.

Sometimes this worked better than expected. By shutting down the flow of packets that early, the connections tended to fail. Sometimes, the scanning programs at the other end would rotate to another open proxy. This meant I'd get to identify and block a whole bunch of evil hosts instead of just one. Win!

Finally, there was the twit who would show up from a new IP address in a /16 every day or two. While he was online, he would send a HEAD for one of my URLs every minute. Really. We're talking hundreds of worthless requests. No amount of 4xx, 5xx or other non-2xx responses would get its attention. I wanted it to go away, but didn't want it to trip on normal traffic.

I wound up making something which would keep track of certain requests by source, and it would look for bad behavior. Basically, anything I could do by hand is something that could then be rolled up into a small program.

Jul 6 18:12:17 web aplog[26587]: Duplicate hits detected from x.x.x.x: count=5
Jul 6 18:12:17 web aplog[26587]: Blocking host [x.x.x.x]: matched (Excessive repeated HEAD hits to one URL)

Pretty soon, I had a whole bunch of blocking going on. This was nice, but it really did need to expire. Back in those days, the only way to have an expiring iptables rule was to write your own thing to manage them. That brought about the birth of something I called "fwd". It was in charge of receiving requests from aplog and translating them into both an iptables rule and some internal metadata on lifetime. Once in a while, it would run through the list and would clear out expired entries.

Of course, it took some tuning to get this right.

Jul 21 07:04:29 web fwd[5512]: BLOCK: [x.x.x.x] [tcp-80] (10800 seconds)
Jul 21 10:04:33 web fwd[5512]: Expiring block: [x.x.x.x] [tcp-80]
Jul 21 10:09:51 web aplog[7600]: Duplicate hits detected from x.x.x.x: count=5

Here, my twit comes back as soon as the block is lifted. I wound up extending my code to allow for longer blocks and for geometric growth of penalties.

In the years since I did my original work on this, iptables has been extended. It's now possible to add rules which will expire all by themselves. That removes a whole bunch of stuff that someone would otherwise have to hack up as I did. Check out the "recent" module if this interests you.

Final note: you really do want to use something like ipt_recent which is hash-based instead of just cramming in new rules. The if-then-else traversal nature of ordinary rules made things seriously slow for me back in the day. Every rule added a non-trivial amount of overhead, to the point where I only had DSL class speeds over local 100 Mbps Ethernet. Ouch!