Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, June 11, 2024

Some early results for feed reader behavior monitoring

I've had a few people ask me for results from the feed reader score project. It's been long enough to where I can start giving some details, now that we've had a good week or more of data collection.

There's one big thing to keep in mind here: I am assessing individual feed reader installations, including whatever config values the user might have set globally or on the test feed in particular. Those config values can be the difference between "amazing" and "get it away from me".

That means a single good entry doesn't necessarily mean that every install of that program will behave perfectly. It also means that a single bad entry doesn't mean that all of them will be terrible.

I've broken them down into a few groups.

Group A: No real complaints. They do their jobs quietly and don't make messes. Anomalies, if any, don't seem systemic and are probably just the result of the user clicking the "poll now" button (or equivalent). This is expected.

Group B: They tend to do spammy unconditional requests at startup, and usually at a needlessly fast rate, too - like less than a second apart. This is what most entries in group B have, and if that's their only problem, then fixing that would move most of them into group A. (There can be other small anomalies which put something here).

Group X: Unusable data. This can be because there's hasn't been enough data collected yet, like if someone just started it up, or if they shut it off before it ran for several days. It can also happen when someone points multiple feed reader instances (same version or not) at their unique tagged feed, or if they load it with a browser, curl, or similar.

Groups C, D, and F: Everything else (and I'm not identifying who's who, or what groups they might be in).

A few minutes ago, I went through all of the tests one by one and came up with my own assessment based on the available data. Ordering within a group is not meaningful.

Group A: instances of:

Group B: instances of:

Anything not shown here is not being tested or is in another group, or I screwed something up and missed it. Contact me if you think I skipped your entry.

I should mention that there are a more than a couple of systemic bugs have been found across multiple reader programs:

Bug: It's entirely possible for a feed's Last-Modified value (seconds) to remain the same while the ETag (length + microseconds on stock Apache) changes. More than a few feed readers assume if they get the same value for Last-Modified, then they don't have to update the cached ETag value. This causes them to effectively make unconditional requests until the feed changes again. Watch out for shortcut evaluations in your caching code!

Bug: If-Modified-Since is only really valid if you were served it as a Last-Modified value previously. Readers are inventing values, or are sourcing them from the wrong layer of the stack. Don't do this.

Only use the last Last-Modified value for If-Modified-Since, and only use the last ETag value for If-None-Match.

Bug: Timing is too tight, and they aren't accounting for how long it takes to perform a poll. I'll probably do a separate post about this since it comes up in other things in the world, too.

Bug: Launching multiple identical requests at feed init time, and usually in a volley that triggers rate-limiting. There's something wrong with the network I/O design when this happens. Calls across the network are not "free" and should be executed sparingly. Don't discard the values only to fetch them again a moment later.