Writing

Feed Software, technology, sysadmin war stories, and more.

Tuesday, October 23, 2012

A night with two seemingly impossible problems

I liked it when people used to come to me with their crazy problems. Sometimes they'd be working on a customer's machine and would be trying to do something, but it just wouldn't work. There'd be some subtle change in how things had been configured, and it would take out the rest of their usual process.

One day, I had a pair of these as "bookends" - one to start my day, and one to end it. As far as I was concerned, that made it an excellent day at work.

The first problem started as I got off the elevator. I saw four guys huddled around a laptop on the 1st shift side of the floor. It was a Saturday afternoon, so things were quiet enough to where the other three could abandon their posts to throw in their two cents. I wanted to know what was going on and walked over to have a look.

They had some machine where they couldn't FTP out in any productive way. Oh, you could get logged in, but things like "ls" or "get" froze. To me, this screamed "active FTP" which then implied "separate data socket" and suggested "overly sensitive firewall without conntrack running". I looked at iptables on the box, and sure enough, the customer was running a software firewall. Adding ip_conntrack_ftp fixed it. Easy enough.

Many hours later, just before it was time to leave, two different guys were looking at this weird situation on a customer who ran Solaris. Most of the customers there ran Linux (or Windows, but that wasn't our problem), so anything involving FreeBSD or Solaris was pretty rare. They had a pair of machines which were using NFS in various ways, and couldn't figure out why some files weren't visible.

Box #1 had /var exported via NFS. Box #2 mounted that as its /var. They were trying to create files in /var/www and then see them on the other system, but it wasn't happening. Oh, they could create a file, but it would only be visible on that same system. This was rather troubling as you might imagine.

This one was tougher since I had to just ask them to run commands and then had to make sense of the output, but it eventually fell into place. There was another filesystem involved!

Box #1 also had a filesystem mounted at /var/www. So, if you were logged in there, a write to /var/www/foo would create it in that filesystem. Box #2 couldn't see it, since as far as it was concerned, that was just another directory within the /var export.

I guess they didn't realize that filesystems needed mount points, and those are just directories, and they continue to exist "underneath", even if they have data in them. NFS was exporting things on a per-filesystem basis, so it had no idea anything else existed, either.

I don't remember what they wound up doing about it. They just wanted an explanation for what could be going on, and this finding was enough. I suppose they may have also exported /var/www from box #1 in order to mount it on box #2 to make them look identical.

Real people, real problems I could fix, and fast results. That's what I like.