Writing

Software, technology, sysadmin war stories, and more. Feed
Wednesday, April 17, 2013

The long slow road to a terminal server

Have you ever waited on a government agency or other similar construct (like city councils, school districts, etc.) to make something happen? Have you ever noticed how a seemingly simple matter which probably could happen in "a couple of weeks" can turn into a year or more? I can't claim to understand all of it, but I can definitely tell a story about one project in particular.

I made a reference to the time I had to install a terminal server (dialup modem pool) at my school district sysadmin gig in my "crack pipe" followup post on Sunday. This is the rest of that story.

This story starts a number of years before a Lucent tech said I was "smoking crack". It started with a standalone box running BSD/OS which had a Digiboard attached to give it 8 external serial ports. Each of those ports had a 28.8K Sportster attached -- these were the old white ones which looked like glorified office intercoms.

We had the "phone tech" guy add a bunch of analog ports to the PBX and run them over the wall to where that particular box lived. They all got their own extension and then it had a "leadin" number which would roll over to the first available extension. This all worked fairly well, but it was a serious mess. Just look at this situation:

Unix box and tons of ports

There's my original Unix box at that job: a "mighty" Pentium 90. Note the explosion of analog ports on that wall along with a bunch of Ethernet drops. Then there's all of that "flat satin" telco cord mess along the right side. Finally, there's a lonely "ZOOM" modem on its side next to the box. That was our Internet connection at the time: 28.8K PPP to the outside world. Really.

The Digiboard and modems are just out of frame in this picture. A couple of years later, I went in there and cleaned up that mess in order to fit a few more boxes in that same spot. Now it looked like this:

Multiple boxes, same modems

I don't have a better shot of this, unfortunately, so I will just describe the situation. There are now three Unix boxes in this picture: the original, its replacement, and another one we added for web serving and proxying duties.

A sharp-eyed viewer might notice the modems have some kind of black marks on them now. This is because I broke out a permanent marker and labeled every single modem, power brick, and RS-232 cable with the tty name. This way, you would know exactly which line you were messing with if you had to disconnect something for some reason. I may have even labeled those flat satin phone cords, for that matter. Those cords were even bundled up with rubber bands to keep them from going crazy.

So, why is the old box there if it was replaced? That's easy. BSD/OS didn't get along with the Digiboard on the new hardware. Any time you started generating a lot of serial traffic, it would freeze the entire machine. I couldn't figure it out, and their tech support was of no use either, so I wound up moving the Digiboard back to the old box, and just left it up for the duration.

This meant I had one box the users would actually dial into, and another one which had their shell accounts and e-mail and all of this. I had to train them to run a special command to telnet (yes, really) into the new machine when they logged in in terminal mode. My users who did PPP or whatever weren't affected by this, fortunately.

The dialups were becoming more and more popular. We didn't want to deal with punching even more holes in the wall and having yet another stack of modems. While both the Unix box and the Digiboard would have supported another 8 easily, the resulting pile of parts would have been a disaster. There was already a decent amount of heat coming off all of those things, and cramming more in there wasn't going to happen.

This ultimately got me looking at terminal servers. In theory, we'd just buy a nice box, put it in the data room where it belonged (instead of out in someone's office), and give it some T1s. Then it would handle users for me and I could finally retire my aging original dialup box. The plan was simple enough: get a card for our PBX which would give us a PRI circuit, then come out of that and drop into a Bay Networks terminal server.

I went out and talked to vendors and had them do their dog and pony shows for me. This got really messy at times since we had strange working relationships with certain companies. USWest was the local telco, and they used to sell us both Lucent and Bay equipment. One day I wound up on a conference call with my boss and two Lucent people, and this one guy would not shut up about how amazing their Portmaster terminal server was. They had just purchased Livingston and now it was the new hotness. They didn't want to talk about Bay terminal servers any more.

I finally had to make him stop talking and just told him "From Lucent, I want the stuff to get PRIs out of our switch. From Bay, I want the terminal server. That's it."

I had made the decision about the terminal server based on its specifications and our needs. For one thing, we had an installed base of users who had modems which spoke the USR-only X2 56K standard. We therefore wanted something which would also speak that on our side. That either meant a 3com device (since they had eaten USR some years before), or the Bay device which also supported it. The Portmaster was right out.

After all of this, I finally had a proposal put together, and then we found out the FCC killed the funding for it. This was one of those "E-Rate" things where they would pay some percentage of the overall price based on how many of our kids were on the free or reduced price lunch program. Apparently they had said terminal servers were "no longer being funded", and so that killed the project.

My box in the corner with the modems shuffled on. My users continued connecting at 28.8K or thereabouts. They were disappointed.

Months passed. I found out there was some kind of "appeals process", and was told to go through all of this again. In the time since I had last worked on this project, new products had come along. The field had changed yet again and so I had to reconsider everything. More vendors came and went, and more of them tried to sell me Portmasters, not realizing that I needed X2 and V.90 support.

Finally, a couple of them clued in and started talking about the 3com Total Control stuff. They had just done something in that product line which made it far simpler than before. Now, a single card would do both the T1 interfacing and the modem stuff. Previously, you had to buy line cards, and modem cards, and all of this.

The new situation was simple enough: one chassis, two line cards, a management card, and the actual terminal server card. We went out for bids, got the responses, chose one (probably the lowest bidder), and placed our order. Yes, we had finally gotten to the point of exchanging money for a product.

A week or two later, this gigantic box arrived. I opened it up at my desk and was rewarded with an insanely heavy device intended to be mounted in a standard 19" rack. I, of course, plugged it in at my desk to make sure it was sane before attempting to rack the monster. It seemed healthy enough, so into the rack it went, both to get it out of the way and to preserve my hearing - those fans are loud!

A couple of weeks went by. We were waiting on parts for the switch. This was the other side of this project. I had the "terminal server" side, and the local phone switch monkey type had all of the PBX side. His job was to deliver me 46 extensions in PRI form into the jacks on the back on my device. My job was to take them and provide awesome dialup service with them. It was a simple division of duties.

The day finally arrived for us to start connecting things. It was a Wednesday. Along with a tech from 3com, we found that one of our cards had failed. It was either the network management card or the terminal server -- not either of the line cards. Those were just fine, and I could log into them through the serial port on the back. The telco monkey overheard this and took it upon himself to send the Lucent techs home because "we have a bad card". The bad card wouldn't have affected the T1 bringup, but he used any opportunity to screw with me.

3com shipped us a new card overnight, and the next day (Thursday) their tech returned to do the install. The Lucent guys showed up and we tried to get it going, but it wouldn't happen. That's when the resident telco monkey told me "our switch might not be able to do PRI". Well, gee, you fool, what have we been talking about for the past year? Doing PRI from your switch into my terminal server to do dialups. It seems like a halfway competent person would have checked for PRI support before accepting that side of the project, no?

I only thought that. I didn't say it. Sometimes I wish I had, though.

Anyway, he said a Lucent guy would come out the next day to get things going. I figured that would be the day we turn this thing up. Ha. Yeah, sure, right.

Friday morning, I showed up, and got word from one of the "network engineers" that the telco monkey guy left a voicemail on his extension. He said he wasn't going to be in today. He also added that he didn't leave a message on my voicemail because "he didn't know the extension". Yes, really. The telco guy didn't know the extension to someone in the same department and apparently couldn't figure out how to use the directory feature, either. The one he set up.

At any rate, the Lucent guy was still supposed to show up that morning. 9 AM came and went, and no Lucent guy was seen. Around 9:30, I walked around to the far ends of the building to ask people if they had seen a lost-looking tech in the area. They said no.

About then, I found out that the telco monkey guy was taking the ENTIRE next week off. The "network engineer" guy who had received the voicemail actually had scheduled a meeting with him and a 3com guy for some other reason in the afternoon, and now that had to be rescheduled.

Of course, the boss was on vacation during all of this, so there was no way to tell him what was going on. Both the boss and the telco guy were slated to return on the same day: Monday, two weeks out.

At this point, the 3com installer guy had to leave. He obviously couldn't come out again, and couldn't wait around due to our stupidity. I signed off on his little paper which said he did what he was there to do, and sent him on his way with my thanks. It was up to me to handle the rest.

Several more weeks passed. Then, one day, I got to spend the entire day on the phone with 3com tech support trying to get things working with whatever the local telco people were setting up on their switch. I'd get it to start syncing, and then it would change again. I had told them what I wanted, and they never delivered it. This is the "smoking crack" part of the story I told back on Sunday, for what it's worth.

Eight days later, somehow, they finally managed to get it going on their end. They punted back to channelized T1 instead of the PRI I wanted, but I'd take that over nothing at all. They also only gave me one circuit, but again, that was better than nothing. I got all of my stuff configured and started making test calls across the PBX.

I added a test account by hand and started poking at it. I was getting "50666" connections dialing the four digits into my brand new terminal server. Typical latency across the link in PPP mode was 90-100 ms as measured with ping. Life was pretty good. I set to work on RADIUS and things of that nature and ultimately got both PPP and ordinary interactive "shell" logins working too for my normal set of users straight from my usual passwd/shadow files.

Finally, I sent out a mail to my ever-so-patient users, letting them know the new system was in and that I would accept some people who were willing to try "beta testing" my new setup. They'd have to change some settings temporarily and might have to deal with oddities while I figured everything out, but otherwise it was up to them.

Happily, a bunch of them signed up and I switched them to the new temporary lead-in number for the 24 lines which worked at that point. They loved it. It was fast, and solid, and never had busy signals.

A couple of days into this, they got the second T1 running and delivered it to my stuff, so I set that up too and now had 48 lines available. The users were in heaven.

It ran in testing mode for a couple of days, and then I had the telco guy switch the lead-in number around. The old number would now go straight to the terminal server, and the old analog extensions were disconnected. I finally got to undo all of that junk and removed it from the corner. The old box was shut down at last.

It took nearly a year to go from "we need this" to "it's done". This is the final result, mounted in the rack and running after far too long:

Total Control rack

...

I have one more story about this telco guy who worked there. It's too short to warrant a separate post so I'm including it here.

There's something called "dialaround" which lets you pick your own long distance carrier regardless of who's been set up as the "1+" default provider (also known as the PIC). Back in the '90s, one way to do this was with five digit codes which started with "10".

"10288" would give you AT&T because "2 8 8" is ATT. "10222" was for MCI -- no idea why. "10333" was Sprint -- same thing. There were all sorts of codes. Due to the way these companies tend to split up and merge and split again, a bunch of them had multiple codes. Some of them worked, others didn't, and some were entirely variable.

Obviously, with just three digits (and some restrictions on which three), there wasn't an unlimited amount of space for new participants. So, at some point, the powers that be decided to expand the space. All of the existing codes would now have another "10" inserted. This meant "10288" would now be "10-10-288". "10222" was now "10-10-222". You get the idea.

This was announced well in advance of any changes. Then there was a "phase-in" period when both the five and the seven digit forms would work. Then, at some later point, the old codes were shut off, and that was that.

I had heard about this. Apparently our phone guy had not. Why do I know this? Easy.

The first day after the "phase-in" period ended, nobody could make long distance calls at the school district. We had this arrangement where dialing long distance would give you a tone, and you had to enter a special code for billing and tracking purposes. Everyone had their own code. Well, on this day, we weren't getting that tone. We were just getting an error.

As near as I can tell, he totally missed what must have been an avalanche of notices and just kept on going the way things always had been. Then, the day it actually broke, he had a huge fire to put out. There were important faxes which needed to be sent to the state, and all of those government numbers were in another area code! Then there was ordinary vendor business with companies who were out of state and didn't have toll-free lines. You get the idea.

I made an offhand remark about "probably needing to just add '10' somewhere", but he didn't comment on it. I don't know if that's truly what happened, but the timing was just too perfect for it to be something else.

Unlike programming and sysadmin environments, there was no way for the rest of us to log into the switch and see what kind of work he was doing. We just had to take it on faith that he was doing the right things at the right time. You can see how well that worked.