Writing

Software, technology, sysadmin war stories, and more. Feed
Thursday, May 22, 2014

Buckets of energy and evil preprocessor tricks

I think about projects as requiring energy. There's this virtual bucket, if you will, and it's what holds onto that energy. Imagine it as a liquid and now you can get a better idea of how it might work. Every day, you wake up and there's that much more energy in your bucket. You can go off and "spend" this on tasks like building things, fixing things, teaching people, networking, and yes, writing. You can also invest it in your family, hobbies, games and things of that nature.

Once that bucket gets low, it's harder to do some of those things. It's usually not a big deal because there's always tomorrow, at which point things will have refilled in time to get going on more things. But really, it's a matter of whatever gets first crack at the bucket of energy. Whatever gets there first wins. There's a time-based element to this (early birds and all that), but also a matter of personal priorities.

So what have I been up to? Well, all of the above. There are quite a few more things calling for energy these days, and something had to give. Unfortunately for my readers, it turned out to be what had been daily posts for two years. There just isn't enough to go around to do this sort of thing well since I'm already giving it to other parts of my life. Sorry.

I should note that I'm basically out of rants about the inside of That Place. I got that stuff out of my system quite some time back and have moved on. Writing about it was still the right thing to do, and I stand by all of those old posts, but there's just no reason to go back to that well. It can rest for now.

...

Here's a quick technical story about something I found and fixed not too long ago. There's this project which relies on a bunch of externally-sourced libraries, just like lots of other things you find on Linux boxes these days. One of these libraries had diverged quite a bit from upstream, so one of the developers decided to do a fresh import to pull it in. This went out in the next release.

Not too long after that, certain functions in this program started misbehaving. It was relatively low-level most of the time, but it definitely had not been there before. Debug logging was added in an attempt to figure out what was going on.

Finally, one night while working on something else, I came across one of these things running and running and running instead of shutting down. It seems it was stuck on a particular inbound request and couldn't finish it. I used some profiling stuff and found out it was spinning in strcmp(), burning CPU like it was going out of style. Wait, what?

The strcmp in question turned out to be part of a check that was run on every item in a linked list. For whatever reason, the linked list never reached the end. Instead, it looped back onto itself at some point, and that particular thread would stay stuck in here. Minutes. Hours. Days, even, if the parent process lived that long.

Then I found another, and another, and another. Some spelunking turned up what the request was supposed to do, and it was associated with the aforementioned external library. So we're getting into an infinite loop in code that's closely associated with something that changed recently. Wonderful. Now what?

Somehow, in looking at all of this, I noticed this code had a bunch of weird C preprocessor gunk going on which set up mutexes and other guards but only if you purposely enabled "thread safety mode". By default, it did no such thing and it was up to you to only have one call outstanding at any given time.

In playing with it some more I found the actual problem: the #ifdefs had changed. The old version was effectively enabling thread safety with "#ifdef FOO", and the new version was effectively gating it on "#ifdef BAR". Since we never defined BAR, we got the unsafe version, and it was only a matter of time before it caught up with us.

I added a #define to enable BAR, but that didn't do it. The code would not just switch on the safety stuff with that. Oh no. Instead, it actually did a bunch of #defines to further abstract away the actions, like this:

helpers.h:
 
#ifdef BAR
#define make_mutex(x) MUTEX_TYPE x
#define setup_mutex(x) x = setup_a_mutex()
#else
#define get_mutex(x)
#define setup_mutex(x)
#endif

The library's code then used them like this:

#include "helpers.h"
#include "local_portability.h"
 
void do_stuff() {
  make_mutex(mu_);
  setup_mutex(mu_);
  do_potentially_dangerous_stuff();
  /* and so on..., then drop mutex before returning */
}

At first glance, this looks simple enough. In local_portability.h , I'd just override this stuff with some pthread stuff. I'd make sure that MUTEX_TYPE was a pthread_mutex_t, and that would be fine.

#define MUTEX_TYPE pthread_mutex_t

That was all well and good, but then it got ugly. I needed to call pthread_mutex_init() on that thing, but that second #define throws away the argument! pthread_mutex_init wants to get the mutex as an argument. It doesn't return the value to store, like their "setup_a_mutex()" obviously does.

So now what? I resorted to great evil. It turned out they only ever used a single variable name (mu_) with this call, so I just hard-coded it into my next compatibility hack.

#define setup_a_mutex(x) mu_; \
  pthread_mutex_init(&mu_, 0);

See that? I don't use "x" anywhere in the macro. Instead, I cheat and directly reference that thing which was thrown away in the prior macro. See that one "mu_;" hanging out by itself? That's there because I had to do something to make the setup_mutex macro compile. Recall that it takes "x" and turns it into "x = setup_a_mutex();". I have to give it something for the right hand side of that =, so it give itself. The final code winds up being something like this:

  mu_ = mu_;
  pthread_mutex_init(&mu_, 0);

Yep, I assign mu_ to itself. Fortunately, this is just C code and it's just a pointer, so that does exactly nothing.

Why didn't I just fix helpers.h to do everything the right way without these silly intermediate steps? Oh, that's the best part. That is part of the library, and it would get overwritten the next time they do a "pull" from upstream. The developers didn't want to "own" the maintenance of that particular fork, so the only way to inject anything safely was in local_portability.h.

All of this actually works, and it's in production right now. Those requests run properly yet again. I documented all of this insanity in the code for the next poor sucker who comes along and tries to make sense of it.

So what am I doing when I'm not writing? Stuff like that.