Writing

Feed Software, technology, sysadmin war stories, and more.

Monday, April 16, 2018

sigaction: see who killed you (and more)

What happens when your process receives a signal? Assuming you care about such things, you might have done a one-shot "signal(SIGFOO, some_func);" in your code. It's simple enough: when you get a SIGFOO, control jumps to some_func(). It's up to you to deal with the rest while avoiding things that should not be run in a signal handler (read the man page for 'signal-safety' if this is news to you).

At that point, you know some signal happened and can tell which one thanks to the single arg your dumb little handler received, but wouldn't it be better to know more? Don't answer that. It is better to know more. You may not appreciate it at first, but when you've added better instrumentation and had your first crash where it explained something, it'll start making sense. It's one of those things you have to build now to appreciate later.

Instead of using signal(), try using sigaction(). It takes a struct, and in that struct you give it a function pointer and some flags. The most important is to set SA_SIGINFO so that we get fancypants callback action on our sa_sigaction pointer instead of a boring old sa_handler like before.

Now you get to write your signal handler, and this is where the fun starts. The second argument is a siginfo_t*, and therein lies the tasty data. This is where you can start learning all kinds of things.

Let's say your program was shut down by something that sent it a signal. Was it your init program? Was it your package manager? Was it a human on the box? How do you know? Easy: print the si_pid and si_uid fields. Later, when groveling through the logs after your system goes down unexpectedly, you can see "uid 0" and know that something privileged knocked it down. Or, you can see "uid 1000" and know that another ordinary user did it. Likewise, if "pid 1234" is what killed it, you can go back into history with something like atop and find out that's some interactive shell. Then you can find out who was logged in at that point and ask them some questions.

Here's what it looks like when I flip to root and poke my program with a SIGTERM:

signal: 15 errno: 0 code: 0
source pid: 9034, uid: 0

The fun doesn't stop there. Ever suffered through a segmentation fault? If you're using C or C++, you certainly have. Someone, somewhere, has screwed up a pointer, used some resource after it was freed, or used something that was never initialized. This can shed light on that, too. You can even see what address was trying to be accessed when things blew up.

Here's the output from something dumb I wrote which logs a bunch of data from that struct. I purposely generated a SIGSEGV, and this is the result:

signal: 11 errno: 0 code: 1
addr: 0x68656c6c6f
segfault code: Address not mapped to object

That address should look odd to you. Even in a 64-bit Linux world, they don't normally look like that. That... is human-readable ASCII. I'll let you decode the hidden message. I created this as a proof-of-concept, but I assure you that occasionally, real programmers wind up making that kind of mistake and put text into a pointer. Being able to see the bad pointer lets you jump right to the culprit.

There are other neat things you can do, and many of them are signal-dependent. If you get a SIGFPE, you can warn of someone dividing by zero. You'd hope it would never happen, but again, given enough time and enough programmers, at a big enough company, you will see all of these things.

If your environment already has a signal handler, go take a peek and see if it's making use of all of the siginfo_t struct. If you're on any version of Linux made in recent history, you might be missing out on some really neat information that you might just appreciate some day.