Feeds, updates, 200s, 304s, and now 429s
In the past, I've written a few complaints about poorly-behaved feed fetchers. It's been a little over a year, and the situation is about the same. There are still a few people out there who think it's cool to poll every minute, or every 2 minutes, or whatever. It's not cool. It's useless. I don't update this thing anywhere near that often, so what's the point of wasting those resources?
There have been some bright spots. At least one person switched on If-Modified-Since headers and even put a little comment in their User-Agent header to let me know about it. That was above and beyond, so thank you to whoever that is.
But, there are still plenty of misbehaving feed readers out there, so it's time to talk about carrots and sticks.
The carrot basically is: if you have a well-behaved feed reader, you will continue to be able to discover a new post on my feed in a reasonable amount of time. This is most people. Most people do it right. Thank you for that.
The stick is: if you do not, you will not. It will take considerably longer to notice something's different out here.
What constitutes a well-behaved feed reader? My primary concern is about not having to serve the full feed to someone who has no reason to pull it again. This means making conditional requests - your client tells my server the last version of things it saw, and my server goes "okay, nothing's different" or (once in a while, after an update) "oh cool, here's the latest".
How do you do this? Ideally, you just run your feed reader and it figures it out. But, trust me, from looking at the feature requests and code bases for far too many of these things this past week, it seems like that's not very common.
This is how the tech part of it works, lest anyone claim it's too hard to implement. My server sends out a number of headers when you fetch the feed. Two of them are potentially applicable here. Right now, they look something like (but not exactly like) this:
Last-Modified: Fri, 06 Jan 2023 00:00:00 GMT ETag: "xxxxx-yyyyyyyyyyyy"
Well-behaved HTTP clients can store those values when they do a fetch, and then return either or both of them in their subsequent requests. The first one turns into If-Modified-Since, and the other one turns into If-None-Match. Note that second one actually requires the "" around it or it won't work. (Yeah, I know. Not my doing.)
If-Modified-Since: Fri, 06 Jan 2023 00:00:00 GMT
... and/or...
If-None-Match: "xxxxx-yyyyyyyyyyyy"
Now, your HTTP client software should take this as some kind of argument to some well-defined setting and you should probably not be setting headers directly, but we're still smashing rocks together for a protocol that's 30+ years old. But I digress.
(Side note: this means your feed reader has to maintain some state per feed. You can't just statelessly fetch a URL until the end of time. That's incredibly boneheaded.)
Just take what you got before and hand it back as shown above. If nothing's changed, you'll get a 304 HTTP code back, and that means "nothing new". It's a short, simple transaction, and uses very little in the way of resources.
If the feed has been updated, say, because I wrote a new post, or did an update or typo fix or whatever to an existing one, then you'll automatically get that returned as a 200 along with a new set of headers. It's your feed reader's responsibility to remember one or both of those fields and then use them later on.
From my point of view, a request with a proper "IMS" or "INM" header is considered a conditional request. I look relatively kindly upon those. Those tend to come from people who want to do the right thing.
A request with neither "IMS" nor "INM" headers is unconditional, and I'm not such a fan of those. I understand that everyone's going to fetch something "fresh" now and then. That's a given. You have to prime the pump somehow. I don't care about that.
But when someone requests the full feed and makes no attempts to conserve, and they do it over and over again, like every 2 seconds? That's when I sit down and start coding. And code I did. That's why I'm writing this post. Poorly-behaved feed readers will no longer get timely updates.
I should note that one particular feed reader sends "Wed, 01 Jan 1800 00:00:00 GMT" and that's utter bullshit. You made that up and you know it. Nobody ever served you a page with that value. See, this is actually known pathological behavior. Sending that does not count as conditional.
Bad clients get a 429. That means slow your roll.
Bonus note for pedants: yes, it's still possible to be abusive with perfectly-formed conditional requests. Please don't try to find out where that point is. Just remember, I don't post that often. You don't need to poll that often.