Writing

Feed Software, technology, sysadmin war stories, and more.

Friday, August 19, 2011

Bootstrapping DNS from public caches in a pinch

One night while working as a tech support monkey, we got a call from the data center guys across the street. They said a power strip had blown up in a rack, and it had taken out three servers. They had managed to save two of them, but the third was toast, and apparently it took out the drive, since it would no longer boot. Someone needed to call the customer and let them know it was time to start rebuilding from any backups they might have had. I was chosen for the job.

I rang up their primary contact's number and left a message with someone who answered that phone. Then I started looking around to figure out just what that machine had been doing, and what parts of their config would break with it gone. They had three or four separate machines spread around that data center, each with different roles. Some were web servers, and others seemed to be running their custom app. This one... oh boy. This one was their authoritative DNS server for a couple of domains.

A quick check with whois confirmed it. They had two domains using the now-dead machine as primary, and for some reason, they had ns2.(hostingcompany).com as their secondaries. This was an odd configuration, since normally you'd either use two of your own machines, or both ns and ns2, but never a mix. Then it occurred to me: they must have been set up in the days when ns2 could be configured to act as a secondary which would pull zone transfers from a customer machine. Trouble is, those days were gone, and ns2 had stopped answering a long time ago. This knocked those domains offline when the remaining box finally died.

So now I started feeling really badly for them. They probably didn't realize what was going on and thought that ns2 "had their back" all that time. It also meant that we had no chance of pulling any data from that box. Their zone files were gone, and their domains were dead as a result.

I still hadn't heard back from them, but I wanted to do something more than just sitting there and working other tickets while waiting for a return call. After some period of pondering this, it occured to me: having ns2 listed as a primary at their registrars was going to become their salvation. I dropped into the customer DNS tool and created those two domains on their account. This would make the zones show up on both ns and ns2, and that would make them resolve... but resolve to what?

I had to find out which values used to be in those domains, both in terms of host names and whatever IP addresses, MXs, and everything else might have been in there. Going based on the theory that anything sufficiently popular would have shown up on web pages, I started hitting search engines. Sure enough, I got a bunch of host names: some were the obvious "www", but others pertained to their app and wouldn't be trivially guessed.

So now I knew what I had to publish in those zones, but how could I possibly find the values? Well, again, anything sufficiently popular is going to be well-used and cached. I just had to find a few well-used caching nameservers and query them. It's been a long time, but I think a combination of 4.2.2.1 and my own ISP's nameservers did the job. I was able to stuff in a few entries based on those results to get things going again.

Some hours later, we finally got a chance to talk, and I thought they were going to flip, because their machine was dead and their DAT drive had stopped actually backing up a long time ago. Instead, they shrugged it off just like "it happens", and upon hearing about the DNS situation I had rigged, were instead relieved. Now they could go about rebuilding the stuff on that box instead of having to worry about standing up BIND and zone files right away to save the rest of their business.

Some techs would say "well, your drive is toast", and would leave you twisting in the wind. Fortunately, those types were not on duty that night so many years ago.