Writing

Software, technology, sysadmin war stories, and more. Feed
Thursday, April 11, 2013

The other SMS and a variety of WinNT disasters

I would guess that most people who see the term "SMS" think of something involving text messaging and cell phones. However, those of us who had some view of the Windows world at some point probably know about the other one. There was a Microsoft add-on which apparently was supposed to allow remote management and upgrading of a fleet of Windows machines. What actually happened with it at one of my gigs was anything but ordinary.

This is a small collection of stories from when they tried to roll it out, plus some other insanity these guys foisted upon their users.

...

One day I showed up and there was a brand new 21 inch monitor on the desk of one of the "network engineers". This was back in the time of CRTs, so it was absolutely massive. As he explained it to me, "I need a bigger monitor because the remote screen's resolution has to match". Apparently scaling wasn't an option for them. I don't know if this was a limitation of SMS or just a limitation of the humans who were running it.

Something else they told me around then was how most of the systems were running at 800x600 because the school district management software called "Phoenix" would break at anything higher. Of course, at 640x480, it would draw stuff off-screen and you couldn't get to it, so there was that, too.

...

There was this guy who was hired on to get SMS working. It wasn't until about four years later that he got around to rolling it out, and when he did, all kinds of things started happening. Right off the bat, they noticed that something like 40 brand new accounts had been added to the domain with full administrator privileges.

The conversation with him and the other "network engineer" went something like this:

"Uh, why did it do that?" (creating those accounts)

"Duh, I dunno."

"And what are the passwords for those accounts?"

"Duh, I dunno."

"You let 40 new admin accounts on and you don't know the passwords?"

They later found out that it does this on purpose: it's actually part of the design. Also, it sets up those accounts with random passwords.

This came after four years of waiting. You'd think it wouldn't be a surprise after that much time had elapsed.

...

Then there was this one time when the first guy was off in traffic court, so the second guy (the SMS one) was there all alone. Well, the primary domain controller just happened to melt down. This took down basically everything, including most of the web pages, since they were mounted over the network from that machine.

All I could do was watch. I remember chatting with some friends and just sharing the latest updates: 15 minutes... 30 minutes... 45 minutes. What was I counting? Downtime. I think it was down for over two hours.

Much later, the best explanation I got from the first guy was that the system had "lost" TCP/IP, so everything based on it like DHCP and WINS had gone with it.

Personally, I think the second guy broke the box while left unattended.

This is someone who used to click on the (phony) [X] button on popup ads and wondered why his browser would get taken to some completely crazy web site instead. It wouldn't be that much of a stretch.

...

At some point, the second one got this notion that the NT servers really needed some disk defragmentation action. Given that I haven't worried about such things since my DOS days, I just scoffed at the entire notion and went about my own business on my Unix boxes.

Still, he pressed on with this project and wound up installing something called DisKeeper on all of the servers. It was supposed to start up, do its thing, and then reboot the box. So one night, it starts up on the primary domain controller (yep, him and that box again), and the box freaked out. Something broke and it didn't come back up.

Just like in the "traffic court" story, this brought down all of the web pages which were hosted there. Of course, it just happened to be the school board meeting night, and they needed to present things which were on that web server...

Yeah. It was another one of those nights.

...

One summer, we were changing over from ISA to PCI network cards. So, one day, I just recompiled my kernel with 3com PCI support, copied it into place, reran LILO to update things, and waited for the hardware to arrive.

When the box of NICs came in, I grabbed one for my machine and went back to my desk. There, I shut down my machine, popped the lid, pulled the old card, dropped in a new one, and buttoned it back up. Then I powered the machine on and got back to work.

Meanwhile, both of the NT guys were fighting with 3com driver disks, rebooting multiple times, and generally had to run around putting out fires to fix everything which broke. I'm guessing they screwed up one or more of their interfaces and may have dropped things which relied on them in a repeat performance of the "traffic court" story.

...

At some point, the boss started using a NT box as a workstation, and it got SMS installed. It generally started hosing over the machine and made it slow to the point of being unusable. His solution was to start logging into his own personal domain since "SMS only comes up when I log into the main one", and "my domain is trusted by the main one".

Okay then. Have fun with that.

...

Not long after SMS rolled out to the client machines used by actual students and teachers. Imagine a bunch of elementary school kids trying to get things done while this updater proceeds to wrap its tentacles around the machine.

One of the teachers was not happy about this and wrote an amazing mail about it. I was on a mailing list she added as a "cc", so I got a copy of it and kept it for posterity.

(Incidentally, this was a public school district, so this is all public stuff. All of our e-mail on those machines was subject to sunshine laws regarding government records.)

She sent this to the SMS guy:

I was concerned that you hadn't responded to my email from this morning, so I called your number and discovered that you're out of town for the week! I am quite frustrated that you did this while away and without answering any of our questions. I now have second graders panicing as it pops up for them. They are also reporting that their computers are frozen, but I've discovered that you can't do anything else while the newuid program is running. By the time I got this across to the class, they'd clicked so many things so many times that some of the computers are freezing up.

I have so many staff members who spend there day restarting and never getting logged in lately, I don't know how many would have even seen the message. Here in the lab by the time I saw the first message, my parent volunteer had already told at least 10 students to cancel it.

Pretty bad, right? Well, the boss responded, and in so doing managed to mention another situation I had forgotten before I started digging into my records for this post.

I apologize for the problems that have frustrated you this week. I try hard to always have one of the engineers on duty, but this week I had to send both to training to learn how to migrate our networks from NT 4.0 to NT 2000. This is crucial to the implementation of the Technology Bond.

Yes, both of these guys were sent to a week-long training class to learn how to go from one version of Windows to another.

...

So, let's talk about the culture of training they had at this place. Every time the smallest thing changed, they would use it as an excuse to go offsite to some place to get "educated" for a week. I never went to any of these things since I thought it was pointless and annoying, and besides, it was only for Windows stuff anyway.

One week, however, things were different. They had this Bay Networks guy in the office for a whole week training them how to use the firewall features of the Bay router software. They wanted to run actual firewall type filtering rules on our big router, and had this trainer brought in. The NT guys invited me to drop in one day to see what was up, so I did.

At some point, I must have asked him about how it would deal with the actual sorts of problems you'd encounter on the Internet, like script kiddies, and mentioned "winnuke". This so-called firewall guy just said "what is that?". Oh. Wow.

Obviously, I had to demonstrate this to him. I grabbed one of their (Windows) laptops and used it to bring in a Windows version of winnuke (yes, such things existed). Then I used it to lob an evil packet or two over at another one of their laptops, and it promptly bluescreened.

They were floored. They had no idea any of this existed. I was just leaning back, enjoying the fun. This was old hat by then, but it was completely off their collective radar.

I should mention that Microsoft had long since released a patch for this "TCP OOB pointer bug" that winnuke exploited. These guys hadn't updated their machines, and so they were vulnerable.

Some firewall specialist, huh?