Software, technology, sysadmin war stories, and more. Feed
Tuesday, August 28, 2012

It's hard to be all things to all browsers

Are you a reader of my feed who's been seeing broken images for the past month or so? Did they just start working again in the past day or so? This might explain it.

Earlier this month, I switched post generation around so that it would generate IMG SRC tags (and links to other posts) with "//rachelbythebay.com/..." paths. By leaving off the "http" or "https", the idea was that a browser would inherit the current protocol and would go from there. This replaced the previous scheme where it would always hard-code "http://rachelbythebay.com/..." stuff.

My goal was to eliminate the mixed-content security warnings for people visiting via https, and while it mostly helped that situation, it turned around and opened another problem for other visitors. I started getting a noticeable number of 404s from browsers which apparently have no idea how to handle a "//" URL. I can tell they don't get it because the hits look like this:

x.x.x.x - - [26/Aug/2012:08:02:07 -0700] "GET //rachelbythebay.com/w/2012/08/25/fish/base.jpg HTTP/1.1" 404 327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.57.2 (KHTML, like Gecko)" "rachelbythebay.com" "-"

Most of them seem to be WebKit based, like Safari or Chrome. Obviously, this is not what I wanted, either. Most of them seemed to be coming in through a feed reader based on the access patterns I could see.

At this point, someone's probably thinking "why not just IMG SRC base.jpg and let it inherit the entire path, including the hostname and directory?". That's actually a really good idea, assuming you're only talking about the actual pages on my web server. For those, you can do a relative URL and it will Just Work, as it always has.

The problem is when that same IMG SRC="base.jpg" relative tag gets into the atom.xml feed. If that happens, then I start seeing 404s for GETs to "/base.jpg". Yep, instead of "/w/2012/08/27/fish/base.jpg", they start hitting the top level of my web server. It seems that base URLs for feeds is a contentious issue, and trying to fix it by adding things to my feed would introduce even more problems depending on whether feed readers recognize it or not.

So, I've split things up, and I no longer send identical HTML to the web pages and my Atom feed. atom.xml now gets fully-qualified "http://" URLs, and that should stop those "//" issues.

However, if I stopped there, then anyone who wants to pull the feed via https and has a browser which doesn't choke on "//" paths would now be getting mixed content. So, I've done a little trickery. I'm now also generating a second feed dump with https:// URLs, and I'm redirecting traffic to it internally. There's actually a second file, and hits to the https version of my site pull that one instead. It's magic!

This means there are really three different versions of things now...

http://rachelbythebay.com/w/... - normal posts

http://rachelbythebay.com/w/atom.xml - feed with http URLs

https://rachelbythebay.com/w/atom.xml - feed with https URLs

... just to make a consistent experience for everyone.

It's a giant, smelly, steaming mess, but at least it's all on my end. Nobody should need to change their bookmarks, feed readers, or any other settings for that matter. In theory, everything should just get better.

Having said that, I imagine there are still more corner cases waiting. There is no shortage of them when it comes to this kind of stuff.

September 15, 2012: This post has an update.