Writing

Atom feed icon Software, technology, sysadmin war stories, and more.

Saturday, August 17, 2024

A common bug in a bunch of feed readers

Yeah, it's another thing about feed readers. I don't blame you if you want to skip this one.

A reader (that is, a person!) reached out earlier and asked me to look at a bug report for a feed reader. It seems they passed along some of the details from one of my earlier posts, and it was closed with no action taken.

The program in question has cache problems, and it's something that's surprisingly common. A bunch of different programs do this, and it's interesting to wonder how they all came to this point.

Well, at least for some of them (the PHP-based ones), I think I know now: they're probably using the same library underneath, and it's been hacked to do some kind of "hash match" thing based on the body of the feed - like a md5sum or somesuch.

It goes something like this, apparently: fetch the feed for the first time. Then store the hash, the Last-Modified time, and the ETag.

Next time, send If-Modified-Since and If-None-Match using the stored values. But, then, if the hash of the body matches, return immediately and don't update anything else... *even if* the Last-Modified and/or ETag values changed! So, next time, they send the old values again, and it never gets any better.

Therein lies the shared bug: they're not designed around the notion of "always return the values you got last time from the web server". If they had been, they would not throw them away just because the hash matched.

So, when might the hash match when the LM or ETag values are the same?

Easy! When someone does this, or the equivalent:

webserver$ touch feed.xml

That will bump both of those fields and will leave the body unchanged.

Them not handling that case means they will start sending unconditionals until the body finally changes. That could be days or weeks later depending on how often something actually changes in the feed. Not good.