Software, technology, sysadmin war stories, and more. Feed
Tuesday, September 6, 2011

Sufficiently advanced technology

There are a few attitudes in software which bug me. One of them is "it does that". Another is "it went away". Both of them are signals of a situation which isn't going to be deeply analyzed to find out what's really going on.

Here's a recent example. There are posts on the discuss-gnuradio mailing list from 2010 about seemingly random deadlocks when calling lock() and unlock(). For context, gnuradio is a suite which allows you to virtually "connect" different components to each other to essentially create a flow through the system. You have to lock the flow graph before you can change it.

Someone reported that they had an application which would frequently lock, modify and then unlock their flow graph, and it would hang. There was the usual back-and-forth about needing a way to reproduce it, and then someone said that they managed to make it go away upon upgrading to "current trunk" -- whatever was in git at that moment.

At that point, there was a response from someone else who seems to be one of the project devs. He basically said he was glad to see it was gone from the trunk and wouldn't go looking for the problem in the old version.

So, to recap, there was some kind of problem, and then it magically went away without being explicitly removed. To me, that's just asking for it to come back, perhaps at a later time when it's even harder to find.

True to form, a year later, there's another post in the archives. Someone reports a hang once a program using this technique runs for long enough. This one is even more disturbing, since this person then replies to their own thread and says "oh, you just need to add another sleep(), down here".

Obviously, the problem is still there. Users are going and doing things to work around it, probably without realizing what a horrible hack adding delays is when there's a race going on somewhere. Their program will get better for now, and then will start hanging again when their system's timing details change for some reason.

Oh, and I found another gem while writing this. A third person reported hangs upon locking and unlocking in order to twiddle some parameter. The response they received was that you don't need to lock and unlock to twiddle that parameter. The actual hang was never mentioned.

I am not looking forward to digging into this.