Investigating gcov crashes after fork() on OS X
I've been working on improving some code with more test coverage. One of these newer libraries calls fork() and execv() to run some external programs. Imagine my surprise when I tried to run it in coverage mode and it crashed with "Abort trap". I did a lot of digging to figure out just what was going on. This is my tale.
My original program had a lot of stuff going on. It had a whole bunch of test cases and other crazy things happening. Any of those bits of my code or the third-party testing framework library could have been responsible. They all had to go. I reduced it down to a single .cc file which had a function which would fork() and execv() something. This reproduced the problem nicely, and it meant all of that testing stuff was not to blame.
After a bunch of runs through valgrind, and gdb, and dtruss, and all of this, I realized that it was just fork() which was blowing up. I could throw away all of that execv() gunk. Great! My reproduction case shrank again. I kept banging on it. Finally, I got it down to this:
$ echo "int main() { return fork(); }" > fork.c $ gcc --coverage -o fork fork.c $ ./fork Abort trap $ gcc -o fork fork.c $ ./fork $
Yeah, now we're talking. One syscall and it all goes down in flames. Now I knew exactly what to blame: the intersection of the libgcov code and fork(). It wasn't anything else. The exact call trace implicated something they added in Snow Leopard for faster shutdowns: there was a "_vproc_transaction_end" right before that call to abort().
I went further and found the source code for libvproc.c online. It lists a bunch of functions which are called by stuff all over the system, including Apple's version of libgcov. It also showed me where things were crashing. I decided to add a call to _vproc_transaction_count() in my code both before and after the fork. It didn't look good.
$ cat fork2.c #include <stdio.h> #include <vproc.h> int main() { printf("pre-fork count: %d\n", _vproc_transaction_count()); fork(); printf("post-fork count: %d\n", _vproc_transaction_count()); return 0; } $ gcc --coverage -o fork2 fork2.c $ ./fork2 pre-fork count: 1 post-fork count: 0 post-fork count: 0 Abort trap
So not only is the child winding up in some uninitialized state, but the parent is too...? That's messed up. I decided to throw caution to the wind and call their vproc_transaction_begin() like gcov, just to see what happened.
$ cat fork3.c #include <stdio.h> #include <vproc.h> int main() { printf("pre-fork count: %d\n", _vproc_transaction_count()); fork(); vproc_transaction_begin(0); printf("post-fork count: %d\n", _vproc_transaction_count()); return 0; } $ gcc --coverage -o fork3 fork3.c $ ./fork3 pre-fork count: 1 post-fork count: 1 post-fork count: 1 $
No crash! This is probably far from ideal, but I'll take it. It's enough to add a quick preprocessor hack in my code to call that when running tests on Apple machines.
I've opened a bug with Apple. It's #9759049, but I don't think other people can see it, so that's probably of little use to anyone but me. For everyone else, enjoy the workaround.
#if defined(__APPLE__) if (testing_mode_) { vproc_transaction_begin(0); } #endif
September 29, 2011: This post has an update.