Software, technology, sysadmin war stories, and more. Feed
Sunday, April 28, 2013

Usenet, binaries, and the other kind of logs

Quite some time ago, I had a friend who was trying to make something happen with a little program which gathered posts from NNTP servers. It was the sort of thing you'd run to bring in a bunch of separate posts so they could be joined and decoded with tools like uudecode. This was one way of obtaining files back in those days.

What he wanted was for this program to name its output files based on how many "parts" were in a set. That is, if there were 1-9 parts, the files should be named part1, part2, part3, and so on up to part9. However, once it reached 10, all of the names should include the new position, and thus needed to be part01, part02, part03, ... up to part99. If it happened to have 100 to 999 parts, it would need three positions, and so on.

I forget how this wound up on my radar, but it seemed like a reasonable request. Filenames with digits tend to sort oddly. This is where you'll see "part19", then "part2", and then "part21" in a directory list. That leading zero would keep things in line. I decided to take a look.

One possibility which seemed obvious was to just take the biggest number and format it as a string with sprintf() or similar. Then I could just use strlen() to figure how long it was, and that's how wide the numbers would need to be. I could then build a format string like "%04s" to zero-pad things appropriately when formatting the filenames later on.

Still, taking a number, making it into a string, and then looking at the length of that string seemed wrong. I thought about it for a while and tried to come up with something which didn't do all of that string manipulation stuff.

While talking about this with my friend, I realized I had encountered something which behaved a bit like this way back in high school math. It tended to yield results which effectively resembled the "width" of a number. That function was log. I couldn't remember what the actual point of log was from that class, but knew it might be helpful.

After a bit of poking around, I determined that the function I actually wanted was log10(). It also needed some help in the sense of truncating the fractional bits and then adding 1, but I had my answer. That number was used to create a format string, and then that format string was used while creating the filename.

I still think of that as rather ugly, but it did work. Thinking about it now, I worry if perhaps using that log10() function would fall into the bucket of "being too clever". Clever code tends to trip up people who do maintenance on it much later, including the original author.

It now occurs to me that this sort of thing could be done with a loop which just kept dividing the number by 10 until it ran out of data. The number of passes would tell you how wide it was. That would avoid bringing in a math library just to do something really dumb with a number.

I guess everyone has their pet ways to handle annoying junk like this.