Software, technology, sysadmin war stories, and more. Feed
Thursday, June 7, 2012

Router "experts" dazzled by MRTG and snmpget

My school district gig was based on top of a network which had a bunch of fractional T1s. The network had been designed around taking the individual channels and splitting them two ways. Up to half of them would be used for voice and headed to the phone switch at each location. The remaining channels arrived at the V.35 connector and thus were used for data with our routers.

This was fine for a couple of years, but eventually the bandwidth demands became just too great for fractional T1s. I personally think it was horrible things like SMS (the NT thing, not the text messaging thing) deciding to spew like crazy, but that's just me. New data-only T1s were ordered for the schools, along with new routers which could drive both at the same time. The telco had some consulting wing which won the bid, supplied the equipment, and had some "Cisco experts" show up to make it Just Work.

Things were interesting because one of the circuits was still connected via that V.35 connector at a fractional rate, and the other one was plugged straight into the T1 "smart jack" network interface. This yielded such fun things as seeing 1.544 Mbps for one and 1.536 Mbps for the other due to the difference in who was managing the actual framing and all of this. That didn't bother me, but it did confuse some of the people who were supposedly there to set it up for us.

Anyway, once they got these things plugged in, they started testing. We had one location with 10 channels on the old circuit and all 24 of the other. On paper, you'd figure that might give you about 2.1 Mbps in an ideal situation: (10 + 24) * 64000 bps. They started getting antsy because the best they could get was about 1.3 Mbps, or slightly less than full (theoretical) utilization of the new circuit alone, never mind any help from the old one.

I heard some hand-waving involving "TCP windows" and "reordering of packets" flying around. I was mostly watching from the sidelines since this wasn't my project. They ended up roping me in to generate traffic since I had Linux boxes on either side of the links in question and could start bulk transfers with no real trouble.

This went on for a while, with me wgetting this or that and relaying the speeds to them. They were never really satisfied, though. About the only way to get both pipes to be saturated was to flood them with pings. I don't think they really expected that.

I tired of this before long and just pointed them at my MRTG graphs to try to get loose from being their file transfer monkey. This charmed the telco router consultant types at first, but then they became disenchanted when they found out "it takes 5 minutes to update".

Finally, I wound up writing something that would let them stop bothering me. All it did was call snmpget over and over to get the interface byte counters while making a note of the time intervals involved. Then it did some math and tried to arrive at a rate for that period. It wasn't perfect, but it would give you plausible numbers which peaked and sagged based on ping flooding and then hitting ^C.

I had to give them an account on one of my Linux boxes so I could then ytalk them locally (yes, this was a while back) and then drop to a shell from inside my session to run this program. That would let them see the output from it in their chat window without letting them into my account with screen or similar.

What's really crazy is that all of this seemed like magic to them. Sure, ytalk and its multi-way chat stuff plus being able to shell out was kind of wacky the first time I saw it, but this was years after that. These router wranglers had never seen anything like it, and then when I started that tool, they were just blown away.

They actually asked me for a copy of it, and that's when it got a little messy. I had written something supremely dirty which was just a wrapper around the actual snmpget binary, and it had things like our SNMP read-only community string hard-coded in it. That was appropriate because we needed it *right then* and I could justify a bunch of shortcuts. That also meant I couldn't let other people see it.

Later, I went back in to that code and gold-plated some parts and set a whole bunch of things straight. Now it would take arguments including the SNMP community string, and then had it link against Net-SNMP (or was it still CMU-SNMP then, hmm...) instead of wrapping and parsing snmpget's output. I gave them a copy of it, but I never heard from them again. I suspect they didn't know what to do when handed the C source code to a small utility.

It's kind of funny in a way. They were brought in to solve a potentially well-defined problem: make more bandwidth available to the schools. When it came time to deliver, not only did they not have the baseline ("before") performance as a known quantity, they had no way to get one! How, exactly, can you be sure that you've actually solved the problem if you can't see where you are, and don't even know where you were before?

Okay, no, I take that back. It's not funny. It's quite sad, actually.