Writing

Software, technology, sysadmin war stories, and more. Feed
Sunday, March 4, 2012

Overly dynamic systems scare me yet again

It's been a number of years since I ran git totally for myself. Back in those days, I used to run a script which would rsync the entire thing to another couple of hosts. This didn't seem to quite be consistent with the intended design, but it got the job done. I just needed something which would both track my changes and keep a copy off my single-point-of-failure workstation box.

Later, I moved on to a job which ran its own source control system and thus didn't need to mess with git. The few times I did use it, I was just pulling copies of things from places which didn't feel like providing regular tarball snapshots. Otherwise, it stayed off my radar.

Time passed, the world changed, and a few companies popped up to provide hosted git service. It just happened to be right around the time when I was working on a school project and needed to share code with other people. I wound up doing all of the coding work and they just looked on, but it was nice in theory. I paid for an account with private hosting at Github so they could just use free accounts to collaborate.

I don't know if my classmates ever looked at the code, but having that account turned out to be useful for me. Having an off-site copy was useful, and it wasn't that much money. Plus, it seemed like a solid service.

When I went out into the world this past spring to start my own little business, I again needed a place to host things. I still had my Github account, so I just added more repositories. It provided a great way to sync up the stuff I was working on across my dev and prod boxes.

That was the status quo for a while, and things were pretty good.

Then this morning happened. Apparently, they have been running their service in such a way that incoming user requests could set arbitrary variables to arbitrary values. There's this big stink going on at Hacker News about whether it's the fault of Rails, or the fault of Github, or both, or neither, but I don't care. To me, it just felt all wrong for something which is supposed to be a solid infrastructure/backend service.

In my opinion, something that "dynamic" doesn't belong in the critical path for a service which I am going to depend on. I had no idea that things had been running in such a way that you could just cram arbitrary data in there unless everyone had been very careful about how things worked.

Anyone who had to work in web hosting support with a bunch of people who were fiddling with PHP remembers the horror that was register_globals. Allowing user-supplied data to just casually mix with the rest of the system just scares the hell out of me. Talking about whitelisting or blacklisting things just scares me more.

My own approach is to have a static SQL query in my code which looks like this:

static const char* kQuery = "SELECT x, y FROM z WHERE a = ?";

Later on, I pass in a string which ultimately gets plugged in there via mysql_stmt_bind_param. If that string happens to come from a GET or a POST, it's fetched from a separate container which holds all of that stuff. There is no way for that data to just appear as a top-level "real variable" unless I do it that way on purpose.

Is it perfect? Nope. But it's an error I get to make myself instead of having someone else make it for me.

Ultimately, this means switching back to hosting my own data. This event forced me to take a good look at what I was doing, and the outcome was simple enough: parking things out there no longer solves any problems for me, and it apparently introduces scary new ones. That's enough to get me to blow an evening swiveling things around.