Software, technology, sysadmin war stories, and more. Feed
Wednesday, August 29, 2012

Future identities and repeating the DNS serial number mistake

After writing about the proliferation of venues for "status" postings last week, I started thinking about the problem some more. My thoughts brought me back to one of my older ideas which involved disconnecting identity from URLs. The idea back then was that in a world with SOPA, it could be hard to follow people if their domain names can be yanked by evil outside forces. To get around that, there would need to be a way to further indirect things so that an author who gets a new URL can provably say "hey, this is me again".

What I've realized is that you don't need a SOPA domain-yanking type event for this sort of thing to be useful. In a world where services come and go as fickle corporations kill them or render them useless or joyless, people are going to move around. There are also the matters of personal style and taste. People will move around for their own reasons.

Let's just make up a scenario. I have lunch with some Valley bigwig tomorrow, and it turns out they want to buy my "operation" for $bignum. Once that happened, rachelbythebay.com would no longer be mine. I'd still want to write somewhere, and people would want to read my latest posts. How can everyone make the connection? It's a manual process.

This got me thinking about things which might be done to "solve" this. One possibility is that people will start making posts which say "hey, I'm over here now", and that will be one of the first things they add on a new service. That might be a good start, since it might be indexed eventually, and then a web search would turn it up.

Then I decided to throw a monkey wrench in the works and push this idea out along the time axis to see what happened. Now imagine it's 15 years later. Many sites have come and gone in terms of utility. Some of them are long past their prime but are still serving up ancient, abandoned content somehow. Others have been turned down, but their content still lives on because someone else mirrored it (see also: GeoCities).

At any rate, you can now find a whole bunch of these "identity" documents from anyone who's been around for a while. All of them say "hey, this is me now", so which one is right?

For this kind of problem, a date-based serial number sounds promising, doesn't it? You could just make it equal to "2012082900" and you'd have plenty of room to increment things as they changed. It's the same trick we've been doing with DNS zone serial numbers for years.

This is when an old problem will come back and start biting people. Someone will eventually screw up their identity announcement's serial number. They will mess up the date somehow, and will publish one that claims to be from the year 2015. Even if they realize their mistake relatively quickly and set it straight, it will always be out there subtly screwing things up for years to come.

Let's say you set it to 2015110500. You now have to inch along in that numeric space and just use 2015110501, 02, 03, and so on, until the real November 2015 rolls around! Only then can you go back to using the number as a pseudo date as it was intended.

This is about the point where someone says "just use plain old numbers and always increment them", and I thank them for their idea but point out a flaw: how do you know what your highest number is? In particular, let's say you're the next hot web site and want to set up one of these identity pages for your users when they sign up. Asking them for a number is going to be error-prone. Trying to search the web for it right then and there is also not a good idea.

That's why in such a world, people would undoubtedly initialize it using the date, since "it'll never go backwards that way". That's true of the date, but as detailed above, it only takes one error to make life miserable for users, and that misery can last for years.

So if you can't use date strings safely, then what can you do? I think I have the answer. The solution is to have bits of well-known data which did not exist before a certain date. If that knowledge is present in an identity posting, then you know it can't be any older than that. It's the same idea as putting a recent newspaper in a photograph to "prove" that it's current.

For this to work, there would need to be a source of fresh content which never repeated and which was available for public reference effectively forever. This could be just about anything, as long as it can be verified without too much trouble when necessary.

Personally, I'd love to see this work as a web site which publishes a short humorous phrase every couple of days. This list could be mirrored as necessary to ensure continuity. Once this existed, it would be possible to map a key phrase used in an identity announcement back to a date. By doing it this way, you eliminate the entire class of errors and resulting sorrow which would come from using a bare date field.

If this "quirky headline as date prover" thing sounds familiar, it's because it also shows up in my video bidding idea from a couple of months ago.

Of course, having written this, someone is going to make something else which relies on a serial number which can never go backwards, and will subsequently re-create the DNS problem. Perhaps they will also get to discover the fun of purposely pushing the number up to the point where it wraps around (MAXINT or similar, and good luck if 64 bits) at which point they can come back up from 0 and put things straight.

Yes, this seriously happens with DNS. It's even documented in RFC 2182. Check it out and then just imagine having to explain that to ordinary users.


Right, so, notice I haven't invoked the PGP monster in this post yet. In a perfect world, everyone would have key pairs, everything would be signed, and it would be a trivial matter to just say "oh, these posts at this new site are signed by the same person as before from that old site" and have things Just Work.

Unfortunately, I have seen enough of the world to know that could never happen aside from a small cluster of hardcore crypto nerds. PGP for all? Signing of everything? Web browsers which know about it and use it to make intelligent decisions about locating content? It's a pipe dream.

I think it's more likely that we will have language analysis programs which yield "fingerprints" given writing samples first. For one thing, such a scheme doesn't need cooperation from anyone else. Someone who wants to analyze text can do it entirely on their own. Then, if they turn up someone who claims to be the same author, they can point their tool at the new content to see if it matches. Then they can arrive at a confidence figure from that.

Of course, even that will only be a transient win. Eventually, there will be "bad guys" who find a way to profit from gaming this system. They will get their hands on this tech and will use it to fine-tune their fake posts. It's easy enough, since all they have to do is try their best to clone the original author's style. Then they feed their fake post to the analysis program to see how closely it matches. If it's not good enough, they can twiddle things until it's high enough to convince most people and call it done.

Just think of it. We could have a "linguistic similarity analysis arms race" at some point. How futuristic will that be?