Feedback: feeds, URLs, uids, LoveSacs and HTML oddities
More feedback, more responses!
An anonymous reader writes in...
PS: Your feed, https://rachelbythebay.com/w/atom.xml, randomly starts redirecting from HTTP to HTTPS to HTTP. My feed reader logs redirects and I see it happens every few days or so.
That sounds pretty broken. In terms of actual redirects that are going on, there are some that you can see and some that aren't so obvious. First, anything coming to www.rachelbythebay.com gets redirected to strip off the "www.". On the http side, this is just a normal Apache vhost which matches the ServerName and does a "Redirect permanent".
On https, however, it's a bit more annoying, and I only recently fixed it to act sanely. For a while, ALL https sites were being served with the same document root, even though the cert would only match for [www.]rachelbythebay.com. Now, if you hit the wrong vhost on port 443 here, you'll get some sort of error depending on exactly how far off the beaten path you are.
Finally, there's an internal redirect which you should never see. The atom.xml URL maps onto a direct file of the same name when you hit it with http, but sneakily remaps onto a different file with a slightly different name when you hit it with https. The only difference is that generated URLs to other posts, IMG resources, and the like, are written with https://rachelbythebay.com/blahblah instead of http.
Why not just do "foo.jpg" in the feed? Some dumb feed readers turn that into "/foo.jpg", neglecting the fact that it should be relative to the post. Why not do "//rachelbythebay.com/w/year/month/day/word/foo.jpg"? Some OTHER dumb feed readers will actually literally "GET //rachelbythebay.com/w/year/..." as if it was a relative link, totally missing the fact that a URL starting "//" means "same protocol, this host, this path".
So, that's why the feed tends to spell out URLs in absolute terms, and by extension, why it preserves the protocol and thus has two slightly different flavors. curl both and diff 'em if you don't believe me.
Here's a little trick you might not have seen yet:
diff -u <(curl -s http://rachelbythebay.com/w/atom.xml) <(curl -s https://rachelbythebay.com/w/atom.xml) | less
Anyway, I never redirect from http to https. I figure, if you really want people to be able to see what you're doing here, then by all means, hit me over http. Maybe you're in some oppressive regime where you can't even DO https. That's why it's still there: maximum compatibility. Everyone else is advised that https is better where possible.
That said, I still need to get the whole Let's Encrypt thing rigged up. Then *all* of my hostnames will be happy!
Looking at the logs, there seems to be a single lonely tt-rss installation which keeps getting a 301 because it's fetching the feed with the "www." version of the https URL. It looks like it's set for a roughly 16 minute polling interval, and while it does follow the redirect, it never seems to update its database, so the 301s flow endlessly. It's not hurting anything, but, eh.
Have you considered adding support for delta feed updates? You'd only return new feeds since the If-Modified-Since datestamp - saving yourself and your readers on bandwidth. It's supported by most of the popular feed readers now.
I actually hadn't heard of it. Having now heard of it, I'm not sure I want to introduce anything "dynamic" to the serving path for this stuff. Right now, it's all a bunch of flat files on disk, and serving it is super fast. You can point HN, reddit, Twitter, and more at the site and it just yawns a little.
The old box had something ridiculous like a 10 Mbps HDX uplink so it would lag a little under a significant network load. The current server is much better in that regard.
Doing adaptive responses to generate the atom.xml on the fly based on what you've seen before would break that simple little world I have now. Generation would have to happen server-side instead of on my laptop, and all of this. I'm not ready to leave that soft cozy place just yet.
Also, the file itself isn't particularly big (470 KB) so I'm not stressing about bandwidth. We could shrink it down to about 171 KB if I gzipped it first and provided yet another endpoint. How about that?
This next one is pretty neat. Remember how I mentioned systemd as a random throwaway in regards to dynamic uids for unprivileged services?
Regarding your last post about choosing unprivileged uids for services, I thought you might like to know that this was recently addressed in systemd: http://0pointer.net/blog/dynamic-users-with-systemd.html
Thanks to Scott and Lionel for that one. When I mentioned systemd while writing that post, I honestly hadn't done any research as to whether it might support it, or really, if anyone had recently added it. I guess I actually should have.
I understand from some friends that they have since turned this on to divorce "dbuser" from "dbmon" in so many words, and it seems to Just Work. Nice!
From the "haven't worked there" department, Dan writes:
I live in Detroit and saw 100% of all this, from LoveSacs to shuttles, at Quicken Loans and Shore Mortgage. I was weirded out reading it thinking you were referring to Detroit but I guess it happens everywhere.
Yep. Never worked there, but it's funny how the world works, right?
Andrew writes in with a possible bug report:
One of your index entries is broken: Jan 31 2012, both of the links go to the hiring page instead of one going to the radio traffic stats post.
... and later ...
It looks like you have an off-by-one error in your code. When you have more than one entry in a single day, the links go to the URL of the previous story, so the second points to the first again and there is always one missing. Seems weird that you've not noticed it before so perhaps it is browser specific: chrome on an ipad?
The weird thing is... it doesn't work that way. The generator is coming from a static file, and for the day in question, it looks like this:
$ grep ^2012/01/31 list
2012/01/31/numbers Radio traffic statistics
2012/01/31/hiring More headcount? How about improving the existing headcount?
For each line in that file, it goes off and emits the "a href" blahblah stuff for the top-level index. There's no state from line to line.
Indeed, if you look at the raw version of the index page, it's fairly boring HTML, almost straight out of 1994, modulo the divs for the pretty colors...
$ grep -2 2012/01/31 index.html
<li><a href="2012/01/31/numbers/">Radio traffic statistics</a>
<li><a href="2012/01/31/hiring/">More headcount? How about improving the existing headcount?</a>
Additionally, the W3 validator thinks the index page is just fine. I did take this opportunity to clear up the DOCTYPE, but it was validating cleanly before that, too.
So... I don't know? Maybe your browser hates list items that aren't closed?
That's it for now!