Writing

Feed Software, technology, sysadmin war stories, and more.

Friday, March 11, 2016

Literal-minded robots make messes easily

If someone gave you a list of commands to run in sequence, and you hit a problem while running one of them, what would you do? Would you say something about it? Would you stop following the list and ask for help? Or would you just carry on like nothing happened?

Okay, with that answer in mind, now consider what you get if later commands rely on earlier commands succeeding and providing output. Do you still like that answer? Are you sure?

I'll give you a scenario. You're a custom cake robot thingy, and you do what you are told, and someone tells you to do these things:

  1. Pick a spot on the kitchen counter. Call that spot KC.
  2. Grab a cake stand from the cabinet.
  3. Put the cake stand at spot KC.
  4. Grab the cake from the fridge.
  5. Set the cake above spot KC at the height of a cake stand.
  6. Grab the icing from the cabinet.
  7. Ask the person next to you what their name is.
  8. Write their name and "is awesome" on the cake with the icing.
  9. Hand the cake stand to the person.

How many different ways can this go wrong? I'll take a whack at trying to list some of them without going for everything.

There is no cabinet, so you fail to get a cake stand. You place the cake at the spot where the top of the stand would be. It's supposed to be a tall stand, so it's a long way down to the counter. The cake is ruined, and it just goes downhill from there.

There's a cabinet, but it's fresh out of stands. The same thing happens. Splat.

You get a stand, but there's no kitchen counter this time so you picked a spot in the air where the counter should be. When you try to set the cake stand there, gravity takes over and it hits the floor, exploding into a million tiny pieces. Then the cake follows a minute later and joins it on the floor. The rest is about the same as before.

You get a plate, but there's no cake in the fridge for whatever reason. You wind up spraying icing all over the spot where a cake should be, and so end up with "so and so is awesome" written in icing on a plate.

There's no icing, but you go through the motions anyway. The cake isn't ruined, but it's also not personalized. The person is sad.

There's some icing, but not enough. It runs out part way through. You write "So and so is aweso". They aren't amused.

You ask them their name, but they're distracted and go "Huh? One sec, I'm on the phone". You give them a cake which proudly proclaims that huh one sec I'm on the phone is awesome.

You ask them their name, but they don't hear you and continue their earlier conversation. You think they're answering you, and take everything they say as their name. You wind up writing a full transcript of their conversation in icing on top of a cake, at least, until you run out of cake and start spraying it where the cake would be if it was a mile long.

They say their name is "Mary Jo Smith". You've been taught that you only want their first name, and that first names are the first word in whatever they say to you. You write "Mary is awesome". Mary Jo is displeased.

The person walks away after answering your question while you're still doing the icing work. As a result, you hand the completed cake to thin air, and it falls to the ground where they had been standing a moment before. Splat.

Would you let a robot act like this? I certainly hope not. But, the thing is, if you write bash scripts in a relatively plain way, you are basically creating a brainless robot which will barrel straight on no matter what sort of bad things happen.

#!/bin/bash
# uninstall foobar 1.0 (unsafe!)
 
X=$(get_path_to_foobar)
rm -rf $X/

Guess what happens if get_path_to_foobar is missing. X winds up empty. So, then, you're going to run ahead with 'rm -rf /', because X will expand to nothing. Awesome. Unless you have a recent version of GNU coreutils which prevents such shenanigans (probably due to people writing scripts like this), you'll blow away everything you can access on the box.

Bonus: if you're running on certain flavors of UEFI boxes and have write access to it through a particular pseudo filesystem setup, you might just render it unable to boot or even brick the management stuff which gives you out-of-band access for remote reboots and consoles. (You don't have to be running systemd to have this possibility.)

You dig into the bash manual and find something that claims to kill the script after a command fails, and you stick this up top:

set -e

Now you get something like this:

./uninst.sh: line 3: get_path_to_foobar: command not found

That's it. It stops there.

You might think you're done, but then some day someone changes the way the get_path stuff works. Now it can return multiple lines of output with "A=...", "B=..." prefixes, and only one of them matters now, so they add a "grep" and "cut" to grab the right part of that one line. Now it looks like this:

#!/bin/bash
# uninstall foobar 1.0 (unsafe!)
set -e
 
X=$(get_path_to_foobar | grep X= | cut -c 3-)
rm -rf $X/

Well, now you have a problem. get_path_to_foobar fails, sure. grep fails too, since it can't find "X=" in empty output. But, hey, what's this? cut doesn't care! It exits successfully. bash sees that and carries on.

What now? Now you go looking for something that'll trip if anything in the whole pipe | line | of | commands fails, and find this:

set -o pipefail

With that set, it's all or nothing: everything in that pipeline works, or the whole thing is considered a failure. This is a slight improvement.

You still have to worry about the result of the pipeline even if it succeeds. What if the line in that file is just "X=", and cut hands you back the empty string? What if the line is just "X=" plus a bunch of spaces, and it hands you back those spaces?

What if that line got messed up when someone hit ^J in nano and unwrapped a bunch of lines, and so now it looks like this?

X=/real/path/to/thing We support photos now

When that gets into your script, you now have this:

rm -rf /real/path/to/thing We support photos now/

Guess what: you just blew away any files or directories in the current working directory called "We", "support", or "photos", and any directory in there called "now".

What if the user had a real directory called "photos"? All of their cat pictures are now gone, gone, gone, and it's your fault.

Now you get to learn about explicitly defining each positional argument in your call to "rm" so spaces don't trip it up. The fun never ends.

Just imagine all of the stuff I didn't cover here. It just goes on like this. Considering how complex this stuff is, and how many different ways it can fail, it's a wonder anything gets done.