Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, October 17, 2012

Playing Windows tech to keep a customer happy

I worked at a web hosting company supporting Linux boxes, but I also really bought into the whole notion of "going that extra mile" for customers. This made me tend to stick my nose into situations where I thought I might be able to help. Sometimes, those things came to me, like in yesterday's story about subnetting gone wrong. Other times, I went for it myself.

One night not long after I started working there, I overheard some issue brewing on the Windows side of the house. Someone had a customer on the phone and was trying to tell them that their problem was unsupported because they were running some weird mail server. Somehow I heard another piece: it was lagging when people connected.

To me, a delay when you connect to a mail server is a DNS lookup gone bad. I usually encountered it with sendmail with the DNSBL checks. This is where it takes the IP address, flips it around like a in-addr.arpa lookup, but then checks it in a bunch of custom domains to see if any of them match. When those services went down, my mail exchangers would tend to delay incoming connections as they tried to perform blocking lookups.

I found out which ticket was involved and looked up the server. I connected to it on port 25 and noticed it was doing exactly the same kind of delays that I would see on Unix boxes. Granted, it was running something called Merak (which was the whole reason the techs were adamant about not supporting it), but it sure felt like a DNSBL thing.

The customer would not let up. He wanted a solution. He wasn't about to take "no" for an answer. People were talking about getting some kind of "full bird" involved to fire this customer outright since he was just too demanding.

The ticket was just sitting there and things were relatively slow on our side, so I figured "what the hell" and assigned it to myself. With that done, I was free to log in and poke around, so I did. Granted, I had to get rdesktop installed so I could remote into a Windows machine first, but that was no big deal.

Once on the box, I just poked around until I found the Merak control panel and then noticed a big box for "RBLs" or "DNSBLs" or something of the sort. It was enabled and had a couple of lists enabled. I decided to switch it off to see what happened, and then tried connecting. The delay vanished. It was repeatable, too: the delay followed that option being enabled. It didn't seem to be any particular list, though. Any sort of lookup seemed to cause it.

I updated the customer with this information. Apparently whatever version of Merak he was running really did not like whatever version of Windows server they had on this new box. Most of the other things it did worked fine, but this particular feature was no good. I advised him that he could lose the delay by disabling it, and hoped it had "revived his faith in our support" (or something to that effect).

This actually worked. The customer was happy. There were internal mails saying "good save" to me, but I suspect I may have made some enemies on the Windows support side of things by doing this. I had basically strolled right into their world and started taking control of things without their "permission" (not that I ever needed it).

What I never figured out was why they didn't bother to just poke around and try a few things. Just because you don't support something doesn't mean you can't try to localize the problem. Even if you don't get a clean solution, you can at least show the customer that you tried to help them out, and give them some options based on what you found. Saying "it's not supported" right away is a great way to annoy them, particularly if your entire company's marketing spiel is based on selling amazing support.

What else could I do? Sit there and pretend I never heard it? I'd probably still regret that decision today.