Software, technology, sysadmin war stories, and more. Feed
Monday, February 25, 2013

Fixing a sick Indy in a fishbowl room

I've had to rescue some weird machines over the years. Not all of this happened in the scope of a sysadmin or tech support gig. Sometimes, I got stuck with something because I was the only "Unix person" around at the time, and possibly because they could guilt me into helping.

This was the case with the sick SGI Indy at my school. In this one "fishbowl" classroom (so named because it had a -large- window to the hallway for no apparent reason), they had a bunch of Windows boxes in rows, and a lonely little SGI box sitting on the back shelf. It was plugged in and could be turned on, but it was impossible to log in. Apparently, someone had lost the root password.

I didn't know anything about you'd bring one of those to single user mode, no documentation was available, and I'm not sure how much help I would have found online. Still, I had a better idea: let me pull out the disk and feed it to one of my Linux boxes at work. I know they can mount the filesystem, even if just read-only, and then I'll read the passwd file and simply crack the root password. Then I'll put the drive back into the real machine, log in with that password, and set things straight.

They actually went along with this, and so, one night, I cracked open the box, extracted the drive, and slipped it into an anti-static bag. The next day I took it to work where there were plenty of Linux boxes with a variety of SCSI connectors. One was bound to fit. I got it connected and mounted it read-only, and that's when I found out what had actually happened. They hadn't lost their root password. Oh no, this was far worse than that.

Someone had scribbled all over /etc/passwd. You couldn't log in as root, or any other user for that matter, since there were no valid entries in the file! It looked like someone had written a shell script in an attempt to mass-add new users from a list, and had overwritten the passwd file with the input data in the process. Then, they must have logged out and/or exited the root shell before putting it back to a working state. Given that Linux did not have a way to write to that filesystem, I was unable to drop in a simple "root:(blah):0:0::/root:/bin/sh" to bring it back to life. I'd need another SGI machine to do that.

Where was I going to find another "real Unix" box? Certainly not where I worked. All we had were a couple of Linux and BSD/OS machines and a gaggle of Windows boxes. Fortunately, I knew people who could help. My friend the webmaster had previously worked at a place which used SGIs. He still knew people there and was on good terms with them, so he got a friend of his to help us out.

One evening, we went over there with the drive and handed it over. There, in a back room, one of their guys cracked open their machine, mounted the disk, grimaced at the mess, and then cranked up vi and keyed in a quick placeholder entry. Then he just unmounted it, shut down the box, unplugged the disk, and handed it back. It didn't take long. We probably spent more time talking about random tech things than actually playing with the drive.

Back at school, I told them what I had found and what we managed to do to fix it, and they were happy, but apparently they had decided to move on. That machine would no longer be used, and even though it theoretically would boot and allow logins, it wasn't worth the trouble of trying to maintain any more. They were over it.

Had I known they intended to just throw the machine out, I would have done far more interesting things with that drive. For one thing, I intended to find out whether the filesystem had any sort of checksums or other integrity measures for the content. I was going to hex-edit the raw partition, find the passwd file, and slap in just enough bytes to make it parse properly. I might not be able to see it as a real filesystem, but those bytes had to in there somewhere. Somehow, sector editing always shows up in my life when everything else fails.

I guess I'll never know whether that might have worked.