Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, April 5, 2022

That simple script is still someone's bad day

Here's a bash shell scenario for people who like these things. Let's say you have a script like this:

#!/bin/bash
reader | writer

... and "reader" fails and yields no data, and then "writer" runs and writes a nice fat blank to the database system. This then causes all kinds of fun times down the road.

So, this gets used as an opportunity to write one of those awful corporate outage bulletins which is more about hawking the company's wares than getting to the bottom of a problem. And, in it, the key is "we forgot pipefail".

It's like, you forgot more than that. pipefail isn't going to stop the second thing from running. Seriously. It's still going to run. Just look.

Let me introduce our cast of characters here. First, we have "writer":

#!/bin/bash
echo "writer is reading stdin"
cat > output
echo "writer is done reading stdin"
ls -l output

It's simple and stupid: it says hello, inhales its data, and tells us what it got. It simulates whatever "read stdin until EOF" you'd have in this scenario.

Next up, here is "reader-that-works":

#!/bin/bash
echo "here's some data."

This one is nice and easy.

Now here's its buddy, the one that's going to fail (and exit 1), "reader-that-fails":

#!/bin/bash
exit 1

No surprises there. Now, we have the script that'll simulate the happy path called "run-good":

#!/bin/bash
set -o pipefail
./reader-that-works | ./writer

And finally, the script that simulates the unhappy path called "run-bad":

#!/bin/bash
set -o pipefail
./reader-that-fails | ./writer

Note that both of these runners have pipefail enabled already.

Let's run the happy path.

/tmp/duh:$ ./run-good
writer is reading stdin
writer is done reading stdin
-rw-r--r-- 1 rkroll rkroll 18 Apr  5 13:51 output
/tmp/duh:$ cat output
here's some data.
/tmp/duh:$ 

Okay, so, now the other one:

/tmp/duh:$ ./run-bad
writer is reading stdin
writer is done reading stdin
-rw-r--r-- 1 rkroll rkroll 0 Apr  5 13:52 output

What's this? The second part of the pipeline still ran? Of course it did. It's *already running* at the point that the reader fails. Its stdin is hooked to the stdout of the other thing, Unix-centipede style. It *has to be there running already*, or the reader couldn't run in all situations! It would fill the pipe of its stdout and then it would block.

This isn't MS-DOS, where foo | bar involves running foo, sending the output to a temp file, then running bar and getting the input from that same temp file. There's parallel execution here, and it's way too late to abort a pipeline if only two processes are involved.

This like is one of those "interview questions" people ask about "being the shell" and trying to understand how it does what it does. One snag they throw at people in these interviews is why you don't want to have a shell command that greps for something in a file and then writes back to that same file at the same time.

That is, something like this (don't do this):

grep -v noise my_file > my_file

You think "that'll remove 'noise' from the file. And it's like... it will... sorta... by removing *everything* from the file! The shell will happily open my_file for writing as part of setting up the execution environment for the subprocess and will clobber whatever's there.

...

And, you know what, the worst part about this is that none of this knowledge should even apply. The fact we're talking about shell scripts for something critical means that the battle for reliability was lost a long time ago.