Software, technology, sysadmin war stories, and more. Feed
Thursday, March 14, 2013

How to fetch and read a "protofeed" file

With all of these users chasing down replacement feed readers, I started pondering a question for training purposes. Just how hard is it for a random programmer to retrieve a feed and print usable data from it? Sorry, that's a trick question. It all depends on how easy it is to decode, and that can vary greatly.

When I tried doing this for myself back in 2011, the first thing I tried to handle was Atom feeds. I figured they were just XML, and while I'm no fan of that, I could just hand it off to libxml2 and figure out how to speak its language of "accessor" functions. There must be some way to get the feed-level info, then some kind of iterator into the individual posts, and so on.

The first thing I learned was that the so-called "null namespace" is not your friend. Until you overcome that, libxml2 will act like there's nothing in your document. You can try and try to get "//feed/title", but nothing will happen. So, you manually cram it into the "atom" namespace, at which point all of your references start looking like "//atom:feed/atom:title". I wish I was joking, but I'm dead serious. It took me a while to figure that out.

Anyway, once I got through all of this, then I had something to finally extract usable data from a feed and make it available to other things. There's a lot of stuff you wind up having to learn the hard way.

I guess all of this was on my mind a couple of weeks ago when I got really bored and came up with the completely nutty idea of protofeed, in which I serialize my posts as a binary protocol buffer just because I can, and it's far simpler than wrangling XML. Then I put that binary blob online in case anyone wanted to play around with it.

Then yesterday happened, and the true fate of Google Reader finally made it to the outside world. And so, I decided to do a series of recordings in which I demonstrate writing a really horrible little program which will fetch a 'protofeed' file and then parse it. It even pulls up one of my posts in a browser within the terminal just to show how it works.

If you've been following along with these introductory "lesson" sessions, then please join me for another. I'm also happy to announce the new index page for all of this, which should make finding new entries much easier for all interested parties.

Here it is: rachelbythebay/edu.

This latest batch of recordings also includes a short sequence on how to get protobuf linking working in 'bb' (via .build.conf), so if that's something you want to do with one of your own projects, you definitely should check it out.