Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, May 1, 2012

No shortage of date formats in RSS

RSS is a wasteland with many similar formats masquerading as one. Important data is encoded as human-readable strings, and even then, there's no guarantee as to how it will be formatted or whether it will even exist.

My biggest complaint comes from dates. There are no fewer than 7 formats which must be understood in order to have any hope of assigning a useful value to the age of a post. Look at this mess.

"Wed, 24 Aug 2011 07:43:19 +0000"

This one is not too bad, but it has a friend:

"Thu, 06 Oct 2011 18:13:05 GMT"

The only good news here is that strptime() will handle both with the same format string via %Z. Of course, that may just be a glibc-ism, which means no end to the fun when trying to use that same format string on another machine. In particular, the Mac OS version apparently only likes "time zone abbreviations of the local time zone, or the value 'GMT'". Lovely.

Next up is this one:

"15 Sep 2011 22:28:22 EST"

Here, we've dropped the day of week (which is superfluous anyway) and now rely on a completely ambiguous term for the time zone. Is that Eastern Standard Time (for the US/Canada), or as the Mac OS man page for strptime points out, Eastern Australia Summer Time? This ambiguity is why they punt on handling just any abbreviation. I can only wonder what glibc does with it.

Number three shows you why I have been wrapping these examples in quotes.

" Mon, 01 Nov 2011 10:54:00 -0700"

Why yes, that is an extra space in front. Thank you, zdnet!

Fourth, we see some ISO-8601 extended format stuff slipping in.

"2011-11-07 00:00:00"

I sure hope that's UTC.

More recently, I found this particular bit of human readable text:

"April 11, 2012 11:50:02 PDT"

I guess that's intended for those people who love to read raw RSS feeds. It also suffers from an ambiguous time zone name.

The next day, I found this:

"Apr/12/2012"

Maybe this programmer was a Guns N' Roses fan, since they sure do love those slashes. They obviously do not care one bit about time.

The latest one I encountered was this, in a bunch of old posts at the very end of a feed:

" > Mon, 01 Nov 2011 10:54:00 -0700"

Space, greater-than, space. My hate for that is > you know.

Oh, the things I do to try to return stupid text fields to reasonable figures for my feed reader.

Atom is so much better about this... when the publisher actually adds a date, that is. A surprising number of posts have none at all. Those special children get a little gift from me. I set their post dates to "2009-02-13 15:31:30 -0800".

There's a reason for that date and time. Can you guess it?