Attack of the broken feed reader
I look at my web server logs from time to time. Sometimes I notice patterns. In particular, feed reader software which is polling for the Atom feed (atom.xml) can occasionally do some very strange things. Of course, when it does, there's usually no contact info, so I can't exactly reach out to the owner to ask what's up. All I can do is shame them here.
With that in mind, here's one ridiculous example, with the IP address rubbed out to protect the guilty. (Perhaps they can identify themselves from the git hash of the ttrss build they have.)
First, it pulls the atom.xml itself. Note that this file is 443 KB and thus contains the complete contents of all of the posts going back quite a ways. I have it limited to 100, which currently goes back to August 2013.
d.e.r.p - - [12/Apr/2018:20:48:25 -0700] "GET /w/atom.xml HTTP/1.1" 200 443606 "https://rachelbythebay.com/w/atom.xml" "Tiny Tiny RSS/1.10.21bb3c0 (http://tt-rss.org/)"
No big deal there. People pull this file all the time. This is intended.
So then it pulls the actual post I just put online tonight.
d.e.r.p - - [12/Apr/2018:20:48:25 -0700] "GET /w/2018/04/12/date/ HTTP/1.1" 200 5973 "-" "Tiny Tiny RSS"
Okay, this is a little stranger, since again, the post's entire contents are right there in the feed. That's the whole point of the feed.
But wait, there's more.
d.e.r.p - - [12/Apr/2018:20:48:25 -0700] "GET /w/2018/04/11/exec/ HTTP/1.1" 200 7009 "-" "Tiny Tiny RSS"
It pulls yesterday's post? Why? You already pulled it... yesterday! And a couple dozen times already today! Note that it's obviously not sending an If-Modified-Since header, since I'm serving out the entire thing (all 7009 bytes of it), instead of a 304 ("use your cache").
This thing then proceeds to pull every other post mentioned in the atom.xml over the next couple of seconds. I'll skip down about 98 lines and give you the last one.
d.e.r.p - - [12/Apr/2018:20:48:42 -0700] "GET /w/2013/08/04/map/ HTTP/1.1" 200 3737 "-" "Tiny Tiny RSS"
101 HTTP requests in 17 seconds, or almost 6 QPS from a single client... for content they already have hundreds of times over. This isn't breaking anything since it's all static content and it's coming from RAM on my end, but come on, it's pointless.
This makes no sense. Knock it off.