Software, technology, sysadmin war stories, and more. Feed
Friday, April 27, 2018

User IDs probably shouldn't be passed around as ints

Let's say you have a web site which has logins of some sort. That's not too much of a stretch, right? I imagine many of the people reading this have crossed paths with such a thing in their careers. Have you ever really thought about how those logins are stored internally in terms of data structures? I'm not talking about sites you merely use here. This is all about the stuff you may have built or may be maintaining.

What is a user account? I'm guessing it's probably just a number for most people, and that number matches some kind of column in a database, in which it is the primary key. That number probably gets handed around from function to function and system to system to describe who's doing what. 1234 logged in. 5678 logged out. 31415 sent a message to 14142.

What creates those numbers? If you're doing the "primary key" thing, I'm guessing it's just an auto-incrementing value in some table in your database. Okay, so they're integers? Where did they start? Zero? So they're not just (positive) integers, but they're contiguous, and biased heavily towards the smaller numbers?

Interesting, interesting. What else looks like that?

How about indexes for arrays, or counters, like "times through a loop"?

Look at it this way. So you're a new web company, and you start allocating accounts. Who's going to get the first 1000 or so ids? It'll probably be all of your early founders, employees, and first customers. Let's say they are 1, 2, 3, ... on up to 1000.

What, then, do you suppose happens when someone writes a loop like this?

ban_senders_of_messages(messages) {
  for (i = 0; i < messages.size(); ++i) {

How big is messages? If it's 100, you just locked out your first 100 users. This probably includes the top dog at the company, assuming they're still around. I have to further assume they will notice this. Hope you keep your job.

The code probably was supposed to look like this, instead:

ban_senders_of_messages(messages) {
  for (i = 0; i < messages.size(); ++i) {

Do you see what happened? message[i].sender is an int which happens to be a user id in this system. ban_account takes ints which happen to be user ids.

The for loop... also has an int called "i" as the loop control variable. Since they're all ints, you can happily cram that index into ban_account or probably any other function in your code base and it'll do something. It probably won't do anything good, but it will certainly make things interesting!

What can you do about this? Well, you can try making user accounts some other type that won't just accept an int like it's nothing. Of course, if you're running in a language which doesn't care too much about types, this might get interesting. You might need something like an entire class just so you can pass the object around and hopefully trip up human errors when certain methods aren't found. Ideally, this would fail at compile-time, but the popular languages bouncing around the web today don't all seem to "get" that ideal.

Another thing to consider is to look for attempts to access low numbered user ids. Obviously, this only works if no actual humans are "living" there. But hey, if the first 20 uids are open for whatever reason, you could detect mutation attempts to them and kill the process before it gets a chance to commit anything. Then you have to work backwards through the stack trace to find out where the wayward numbers are coming from. It's messy but at least it has a chance of stopping a disaster.

Yet another idea would be to deliberately skip over a decent-sized chunk of the id space when starting out. Maybe start at 10000 instead of 0. Then you automatically have a nice big block of "canary uids" to use in the alerting stuff mentioned in the last paragraph. Unfortunately, you have to think about this one well in advance before the system goes online.

Still another possibility is to use a large key space, like the 64 bit integers (or perhaps the subset which can be represented by Javascript, sigh). Then you can do sparse assignment within that space. Besides the fact this will make it really hard for anything to accidentally hit a real user, it will also hide just how big (or small) your service may be. Knowing how many humans are on a service is signal for potential competitors, after all...

If you're an early user of a service and occasionally see really wacky things that other early users report but nobody else does, this might just explain it.