Writing

Atom feed icon Software, technology, sysadmin war stories, and more.

Tuesday, December 17, 2024

Feed readers which don't take "no" for an answer

I don't think people really appreciate what kind of mayhem some of their software gets up to. I got a bit of feedback the other night from someone who's been confounded by the site becoming unreachable. Based on running traceroutes, this person thinks that maybe it's carrier A or carrier B, or maybe even my own colocation host.

I would have responded to this person directly, but they didn't leave any contact info, so all I can do is write a post and hope it reaches them and others in the same situation.

It's not any of the carriers and it's not Hurricane Electric. It's my end, and it's not an accident. Hosts that get auto-filtered are usually running some kind of feed reader that flies in the face of best practices, and then annoys the web server, receives 429s, and then ignores those and keeps on going.

The web server does its own thing. I'm not even in the loop. I can be asleep and otherwise entirely offline and it'll just chug along without me.

A typical timeline goes like this:

Somewhere around here, the web server decided that it wasn't being listened to, and so it decided it was going to stop listening, too.

Some time after this, it will "forgive" and then things will work again, but of course, if there's still a bad feed reader running out there, it will eventually start this process all over again.

A 20 minute retry rate with unconditional requests is wasteful. That's three requests per hour, so 72 requests per day. That'd be about 36 MB of traffic that's completely useless because it would be the same feed contents over and over and over.

Multiply that by a bunch of people because it's a popular feed, and that should explain why I've been tilting at this windmill for a while now.

If you're running a feed reader and want to know what its behavior looks like, the "feed reader score" project thing I set up earlier this year is still running, and is just humming along, logging data as always.

You just point your reader at a special personalized URL, and you will receive a feed with zero nutritional content but many of your reader's behaviors (*) will be analyzed and made available in a report page.

It's easy... and I'm not even charging for it. (Maybe I should?)

...

(*) I say _many_ of the behaviors since a bunch of these things have proven that my approach of just handing people a bunch of uniquely-keyed paths on the same host is not nearly enough. Some of these feed readers just go and make up their own paths and that's garbage, but it also means my dumb little CGI program at /one/particular/path doesn't see it. It also means that when they drill / or /favicon.ico or whatever, it doesn't see it. I can't possibly predict all of their clownery, and need a much bigger hammer.

There's clearly a Second System waiting to be written here.

As usual, the requirements become known after you start doing the thing.