Writing

Feed Software, technology, sysadmin war stories, and more.

Saturday, March 2, 2013

Patching script and scriptreplay

Continuing from last time, it's time to patch script so it doesn't corrupt the time log. Since they are both related, I'm fixing both of them at the same time.

The first problem was that oldtime was being initialized using time(), not gettimeofday(), and that was slicing off the microsecond precision used elsewhere. That in turn caused the first reading to be too high and vary based on when you started the program.

That fix was simple enough: stop setting oldtime with an initializer, and instead just add a little block just before the main loop of dooutput():

if (tflg) {
	gettimeofday(&tv, NULL);
	oldtime = tv.tv_sec + (double) tv.tv_usec / 1000000;
}

The second problem was that the code was reading the clock at the wrong time. It was getting a time fix before the read and this caused all of the numbers to be offset by one block of data (output from the terminal). The solution to this one was to stop calling gettimeofday() right before read, and instead call it immediately afterward.

With both of these changes in place, I built a fresh version of script and gave it the same test from yesterday:

The results are much better.

$ misc-utils/script -t -c "echo -n 1; sleep 1; echo -n 1; sleep 1; echo -n 1" >/dev/null 2>t

$ cat t

0.002807 1

0.999992 1

1.000513 1

Now it actually makes sense. This means "wait 2 ms, then write one byte", "wait 999 ms, then write one byte", and finally "wait 1000 ms and then write one byte", and that is exactly what the command sequence did.

This brings me to the next point: dealing with "scriptreplay". This tool is included with the util-linux package and interprets this data in order to preserve the timing data. Unfortunately, when this thing was written, someone decided to just work around the off-by-one situation in script instead of fixing both.

That means feeding this new-style output to scriptreplay will give weird results, so it also needs to be fixed. Fortunately, this one is an even easier fix: just remove all references to "oldblk" and make it use "blk" in the call to emit.

With that fixed, scriptreplay looks just like running the original test command. There are no unusual delays or places where it surges forward uncharacteristically. It just works.

This leaves just one final problem: compatibility. Timing output from basically every other version of script on the planet except for mine is going to be flawed, and playing it back in the fixed scriptreplay will give potentially bizarre results. Likewise, timing output from my version of script will look completely crazy when played back on any other version of scriptreplay, including my own yet-to-be-patched "jvt" viewer.

There is no versioning or any other kind of cue in the output from script to make it enable a "compatibility mode" like the way web browsers tend to "step down" to handle old broken pages. I suppose it would be possible to emit something in the header ("Script started on ..."), but that would have to be there forever, and it would run a good chance of breaking anyone who expects that output to be exactly three words and then a date/time blob.

Personally, I'm not going to worry about compatibility, but I mention it here because there is a nonzero chance of this being raised when attempt to send this patch upstream. I haven't actually patched the latest version from git since it has some crazy new library deps (libuser? wtf?) and I'm not ready to go down that rabbit hole just yet.

Anyone who is interested in the patches can get a copy via GitHub gist 5074299. If that doesn't work for whatever reason and you want a copy (say, down the road when the whole site has been erased by yet another Rails vulnerability), just contact me directly and I'll try to find a copy for you.

...

Random trivia note: I went back into the history of util-linux as far as I could. This involved finding my old Slackware 8.1 (!) CD package and digging around in the source directory. While this feature was introduced in util-linux 2.11i, I was unable to get back to that exact version. I did, however, manage to get util-linux 2.11r from that Slackware CD, and it's pretty close.

It turns out that "scriptreplay" is actually not the original code. It was originally contributed as a Perl script called just "replay". Guess what I found in there...

        read(TYPESCRIPT, $block, $blocksize)
                or die "read filure on typescript: $!";
        print $oldblock;
        $oldblock=$block;

Yep, it's the same "postpone it by one block" logic. That's from "replay.pl", dated January 12, 2002. When it was ported to C and renamed scriptreplay some time later, they preserved that same behavior.

Maybe someone really wants this thing to behave this way.


March 4, 2013: This post has an update.