Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, January 11, 2012

Use PXE boot to cut down on needless trips to your co-lo

Here's a situation which seems all too common. There's a lone sysadmin managing a whole bunch of machines. One day, a decree comes down from on high: some of the machines are to be reinstalled with a new operating system. The company is young and thin on cash, so they don't have a KVM to give remote console access.

(Aside: "real" servers would have serial consoles or other out of band stuff, I know. Go with me here. These commodity screwdriver shop boxes don't work that way.)

What happens next is a series of l-o-n-g drives from one side of the bay to the other and back, often more than once a day. This is madness! There is a better way to handle this. Odds are, the existing hardware will do it for you. You just need to be clever about a few things on your network.

Unless you have really crusty old machines, chances are good that you'll have the opportunity to add "network" to the boot sequence in the BIOS. If you're really lucky, it'll already be there, and it'll be sitting before your CD/DVD and local hard disk options.

Once that's set, your machine will jump into the network card at boot time and will try to get some action from the network. It'll do DHCP requests, and if it manages to get the right sort of responses, it'll pull a file via TFTP and boot it.

Normally, you'll just point it at a config which tells it to boot from the local drive, and it'll go on with life. Where it gets interesting is when you need to reinstall it. Break that host out of the default config block in your DHCP server and point it at a target which will then load your favorite OS's installer or just a rescue image with sshd. Either way, once it comes up on the network, jump in and get cracking.

This doesn't require much in the way of coordination to get started. If you already run dhcpd on that network, adding a group block with next-server and filename directives and host blocks inside of that will take care of that part. You'll need to stand up tftpd somewhere, but if you have a bunch of Unix boxes around, that's no big deal.

Probably the hardest part is figuring out the contents of the tftp path. This will probably involve something like pxelinux and one or more files in pxelinux.cfg to tell it what to do, along with kernel images and possibly some initrds. It can take some fiddling to get this just right.

I used this technique not too long ago to try some things on the shell of my old workstation. That system doesn't have a hard drive or even a CD/DVD any more, and I don't have any to install temporarily. I needed to boot Linux on it and run some tests to see if the CPU was fast enough to handle some big task without falling behind.

I wound up standing up this DHCP/PXE/TFTP situation with the initrd and kernel from a Slackware install image. This let me bring up the machine into a standard install environment as if I had booted off the CD. Later, I just mounted the disc image over loopback on another machine, exported that tree with Samba (!) and then mounted that on my test box.

From there, it was just a matter of standing up a ramfs "filesystem" in memory and installing enough packages to have a reasonable runtime environment: bits of glibc, a shell, and so on. Then I just chrooted into it, started up a sshd and added a user. The rest could happen over the network from a proper workstation.

What I did was a one-off situation, but anyone who regularly reinstalls machines should figure out how to do this automatically. If you do it properly, the hardest part will be figuring out how to kick your machines so they reboot and the PXE ROM stuff has a chance to run.

The alternative is many, many trips over the Dumbarton bridge. Sure, it's pretty, but I'd rather make those trips on my own terms. Being subservient to your computer systems isn't worth it.