Writing

Feed Software, technology, sysadmin war stories, and more.

Saturday, October 13, 2012

Let's dissect ping and glibc to chase down an odd feature

There was a post on Hacker News the other day along the lines of "what makes ping take 192.168.01234 as a valid address?", and none of the answers seemed particularly deep. I left a comment grousing about the lack of mentions to strtol() since that was probably the cause. I will now go into some more detail in the hopes it is useful to someone.

My immediate thought about strtol() was because I knew that it takes a "base" argument, and if you set it to 0, it tries to be helpful. It'll turn "123" into 123, but it'll also turn "0xff" into 255 and "0173" into 123. This is because it honors 0xnn for hex notation and 0nn for octal notation.

At the time of that comment, I had not bothered to actually drill down into ping. But hey, let's do that now and see what the deal is. I'll analyze the ping binary found on my machine. /bin/ping comes from the iputils package on Slackware64 13.37 (yes, that is really the version number), so I'll open that up.

$ cd iputils-s20101006/
$ ls
INSTALL     clockdiff.c   ping6_niquery.h  tftpd.c
Makefile    doc           ping_common.c    tftpsubs.c
Modules     ipg           ping_common.h    tracepath.c
RELNOTES    iputils.spec  rarpd.c          tracepath6.c
SNAPSHOT.h  ping.c        rdisc.c          traceroute6.c
arping.c    ping6.c       tftp.h

ping.c looks promising. Does it have a main? I dig around, and yep, it does. It uses argv, and getopt, and down below that, argv turns up again in a context which looks meaningful:

while (argc > 0) {
	target = *argv;
 
	memset((char *)&whereto, 0, sizeof(whereto));
	whereto.sin_family = AF_INET;
	if (inet_aton(target, &whereto.sin_addr) == 1) {

Okay, so we're taking (what's left of) argv, which should just be the target we gave it, and we're giving it to inet_aton. The man page says it'll take a.b.c.d, a.b.c, a.b, or even just a, so that explains why it accepts the funky dotted triplet instead of a dotted quad.

So now I want to see what makes it do that, and whether it's actually using strtol() like I suspected originally. inet_aton is part of glibc, and so I untar the source of glibc-2.13 and start digging around.

A recursive grep for "inet_aton" finds more than a few instances of that string, but only one of them resembles a function being defined:

./resolv/inet_addr.c-int
./resolv/inet_addr.c:__inet_aton(const char *cp, struct in_addr *addr)
./resolv/inet_addr.c-{

Right, so, into that file, then. Notice this is actually called __inet_aton, but elsewhere in my grep it suggests this is handled via one or more #defines. Also, there are no other functions, so this is probably our code.

In that function, I find something close to what I suspected initially:

#ifdef _LIBC
{
	char *endp;
	unsigned long ul = strtoul (cp, (char **) &endp, 0);
	if (ul == ULONG_MAX && errno == ERANGE)

It was strtoul(), not the strtol() I first suspected, but the rest is right. Note that third arg of 0 which tells the function to allow the hex and octal notations. I was close!

There's another preprocessor block which is used when _LIBC is not set, but we pick up this function via glibc, so I didn't bother to dissect that. Suffice it to say that it does a bunch of manual twiddling of characters to figure out the base, check for valid digits (hex or otherwise), and then create the value. It's effectively a local clone of strtoul() with just the needed coverage for this function.

So what about the variable dots? Well, all of this happens inside an infinite loop with a couple of pointers which are initially set to the beginning of the input char buffer and a four-byte buffer for the resulting address. When it gets a valid value by way of strtoul() (again, assuming _LIBC), it scoots the input pointer along past whatever strtoul consumed and looks for a dot.

Upon finding that dot, it makes sure that you didn't give it a value beyond 255 since that particular trick is only allowed for the last position, and a dot means more stuff is coming up. Per the comments:

/*
 * Internet format:
 *      a.b.c.d
 *      a.b.c   (with c treated as 16 bits)
 *      a.b     (with b treated as 24 bits)
 */

Then it scoots the input pointer past the dot and stores whatever value it found into the address buffer and pushes that pointer along at the same time. This ends the block, and the loop starts over.

This keeps going until there's no dot left after the strtoul() call, and then few more sanity checks make sure nothing's left over and that all of the data has made it to the address buffer. There should now be four bytes in there, or just enough for an IPv4 address. If all of this worked, it returns 1 and that's that. If anything failed, it jumped down to a label called ret_0 (yes, with a goto!) and returned 0.

So that's enough for me. ping hands it to inet_aton. inet_aton allows for zero to three dots in the input as long as certain conditions are met. inet_aton also uses strtoul() to turn the printable numerals into their actual values, and it uses an arg which says "it's okay to do base conversions based on certain magic input formats".

Putting this all together allows you to ping something like 192.168.01234 and have it work. It also means you can ping 0xc0.0xa8.0x29c and it'll work just as well.

This should be obvious, but I'll say it anyway: this only applies to systems using ping from iptools and glibc. Other systems with their own ways of handling "atoi" work and ping targets might not support it.


October 20, 2013: This post has an update.