Software, technology, sysadmin war stories, and more. Feed
Saturday, June 1, 2013

An independent feed reader backend

On Thursday, I asked for some input from my readers regarding future projects. Then I rigged up a bunch of links for various possibilities which lead to levels of interest. It's still early in the process, but already there are some results coming in.

It seems that a fair number of people are interested in having me release fred (my feed reader) as open source. I imagine most of the interest comes from the ability to break free from the rest of the world and do it yourself. If you have your own fred installation, you're not at the mercy of Google or any other provider. As long as your feeds continue to exist, you can keep fetching and reading them.

I also received some feedback regarding using it as a "universal backend" of sorts. It seems that Google Reader has an API that may not have ever been officially documented but is in use by a great many things. I guess everyone just reverse-engineered it and now they're all freaking out since they won't have a backend in about a month's time.

So, it occurs to me that perhaps some forward-thinking individuals would like to break free to run their own backends while still using their favorite frontends. That is, maybe they don't want to use the fred UI (the web site), but perhaps they would benefit from having the rest of it (the fetcher, parser, and so on).

It just so happens that fred has been constructed in such a way that using it as a generic backend shouldn't be too difficult. Here's how it works.

There are five endpoints -- different URLs. They are getpost, sawpost, userfeeds, addfeed, and dropfeed. There isn't too much to them, but together, they make the whole thing work. Everything happens via HTTP POST arguments on the input side, and JSON on the output side.

getpost returns an array of posts. A post is just the fred_id (a unique identifier within the entire backend), the title, url, mtime, id (unique for a post within a feed), and the actual content. This comes from the database, and it uses a cookie to tell users apart. It only returns unread posts.

A post might look like this:

fred_id: 117395,
title: 'Stupid network tricks',
url: 'http://rachelbythebay.com/w/2013/05/31/mux/',
mtime: 'Friday, May 31, 2013 16:59:38 PDT',
id: 'tag:rachelbythebay.com,2013-05-31:mux',
content: ( ... you get the idea ... )

I'll probably wind up changing mtime to be a number, but otherwise it should just work.

sawpost is quite simple. Pass it a 'fred_id' and it'll use that and your cookie to mark the post as 'seen' by you. It won't show up when you call 'getpost' after that.

userfeeds uses your cookie to figure out who you are and then returns an array of feeds. A feed is just the feed_id and the feed_url. It's just enough to make a list.

addfeed takes a feed_url and potentially creates a new feed. Then it adds an entry in the database which says your account (identified by your cookie) is subscribed to this feed.

dropfeed takes a feed_id and removes your account's association with it.

That's the whole web interface. The actual fred web site is just a big bunch of JavaScript which calls those different endpoints. There's nothing which says you have to use it, and indeed, anything which spoke the right 'language' to those five URLs would work just as well.

So, about the backend. There's a program called fetcher which should run via cron once in a while. It keeps track of poll attempts, so it won't try to hit any URL more than once every couple of hours. If it runs into trouble, it'll say something on stdout so it'll come back via the cron e-mail interface. Feeds which disappear or otherwise break will show up as 404s, or 500s, or similar.

fetcher itself has fairly simple logic. First, it builds a list of feeds which haven't been polled lately. Then it just flips through them, using libcurl to grab the URL. If it hasn't changed since the last poll, it moves on. Otherwise, it runs it through libxml to parse it as either Atom or RSS, and gets a sensible list of posts.

Each post is then checked to see if it's new, old, or just an update for an existing post in that feed. If it's old, it's ignored. Otherwise, it gets added to the database. At this point, the post becomes available to clients via 'getpost'.

Finally, it writes a log entry regarding this feed's poll, and it goes on to the next feed. These log entries are how it knows to not hit any one feed too often, and also are quite handy for troubleshooting flaky feed behavior later on.

Then it moves on to the next feed.

That's it. It basically needs a Unixy environment, a web server, and a database, plus connectivity to the outside world to fetch the feeds. I run it on my Linux box but it should run in most places where the necessary libraries (libcurl, libxml2 and jansson) exist.

Okay world, that's fred. The question is: do you want it?

Talk to me!