Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, May 15, 2013

Bolt on a feature and it will be apparent

I want to use an example of the differences which can be evident when you have a feature bolted on to a product as opposed to having a product designed from the ground up around a feature. This is a bit of a thought experiment, but I'm sure experienced readers will think of a few cases where they have encountered this same kind of problem in tech.

Imagine a laptop computer which has a touchpad for mouse type input. This is probably any laptop you can buy these days. You use it instead of having to cart a separate mouse or trackpad around. That's not too special. What's special about this laptop is what else the trackpad can do.

I'm not talking about multi-finger gestures. Assume it does any kind of gesture your heart desires. This is about user recognition. You see, this trackpad can somehow figure out who's touching it at any given moment and will switch to their workspace to let them do things.

If I'm using the machine and then pass it to my friend who needs to look something up, she can just tap the touchpad and it will immediately (this is important!) jump into her session. Her web browser and other stuff will be there just like it was the last time she used it, and it will respond as if she had been using it all this time.

If she hands it back to me and I tap the touchpad, it will immediately jump back to me. This can go back and forth indefinitely.

Now, given this constraint, how would you design this? Would you just run my tasks and her tasks in parallel in a Unixy sort of environment and just switch them in and out as the user presence data changes?

Just how quickly could you do this on (say) OS X? How about Linux? How about Windows? I'm guessing tens of seconds would be required to make it finish switching to another user. It would have to page in a bunch of read-only binary data which had been evicted from memory. It might have to bring user data back in from swap which had also been evicted at some point.

This is a slow process. It would be evident that you did not really have two smooth and responsive environments at your disposal. You really just had one environment, and with a big lag, you might be able to bring a second one in to replace it. Then you have to expect that same amount of lag when switching back.

The system would have to be explicitly designed around "fast flips" for it to be instantly responsive. You'd have to go to lengths to make sure that anything which could be a user-interactive process remains available at all times. This might mean locking pages into memory or even coming up with some completely new marking, allocation, and eviction plan to handle what can and what can't be swapped or paged out.

If you tried to do this with existing systems, you would experience only pain. It would be obvious that it was something layered on top and it would only work if you threw colossal quantities of resources at it. You'd need the fastest CPUs, tons of memory, and the absolute fastest I/O devices you could get. It would also be stupidly expensive as a result.

And still, since it hadn't been designed to be responsive, it would eventually "rust" and would reach a point where it wasn't keeping up any more. Your users would get annoyed by this. Some might "level up" and buy the latest new ShinyThing with even faster hardware. Others might just give up and write it off as impossible.

So now that I've established how "bolted on" technology can be apparent, where do you see it in everyday life? I see it on my Mac any time it decides to run a backup. This makes it evict all kinds of stuff from memory. I assume there's some horrible VM implementation involved which makes it punt all kinds of stuff out of physical memory just because something is reading all over the disk.

I have had multiple occasions when all I was doing was running iTunes to play music and Firefox to play my scanner audio, with it being mixed and pushed to my headphones by Airplay. The machine was otherwise idle, and indeed, it was sitting on my desk not doing anything else.

Time Machine fired up and started doing something. How did I know this? Easy. iTunes started skipping. Then, when I went to poke at the machine, it wouldn't respond for easily 15-20 seconds at a time. This is where you poke the pause button and it does nothing, so you wonder if it didn't "make the connection", so you push it again. Then when it finally wakes up, it pauses and immediately unpauses since it handles both of the button presses at nearly the same time.

The same applies to volume changes and everything else of the sort. When this happens on the system, forget about getting anything done.

The system log also gets polluted with notices about how it's killing this process or that due to memory overhead. This is a box where I had to double the memory when I moved to Lion because it had become so stupidly slow doing the same tasks as before. Adding memory didn't fix anything - it just made it suck less.

It reminds me of the days when the Linux VM implementation wasn't quite there yet. They hadn't really figured out a good balance for things like "swappiness" and OOM killing. Fortunately, that was a long time ago, and my Linux boxes haven't done that in probably over a decade.

My Mac, however, likes to party like it's 1999.