Software, technology, sysadmin war stories, and more. Feed
Friday, October 12, 2012

The dangers of having critical cron jobs on workstations

One of my friends is currently doing a split migration for two related companies which are moving apart. He's the one who's having to figure out RAID stuff that I mentioned in a post last week.

In my capacity as a consultant and advisor, I get to tell people about ways they can do things, and the implications each one has. There are good parts and bad parts to any technology, and more than a few interesting "gotchas" hiding in the shadows. On one day, we got to talking about providing file sharing services to his users.

He's running a mix of Windows machines and Macs, so he's going to use Samba and Netatalk. This is not too surprising. Alternatives came up, and in that conversation, we started talking about NFS. That's where things got interesting. We eventually reached the topic of what you do when you have rather big environments, multiple locations, multiple file servers, and so on. That's when Kerberos came up.

I mentioned how a common setup might be to have NFS running on big appliances (by NetApp, or whoever), and then use Kerberos for the actual authentication. With "Kerberized" NFS, you need a valid authentication token to actually access protected areas. This could include home directories. Those tokens can (and probably should) expire after some relatively short amount of time.

This brought up a new interesting way for things to fail. Let's say you set your tokens to a 48 hour expiration. Assuming they auto-renew whenever you log in to your workstation or just unlock it, then you should never really notice any problems. It even means you can do naughty things like running important cron jobs as yourself (!) from your personal workstation.

This scheme will actually work... until the first three-day weekend. Then, partway through the weekend, the 48 hours will elapse and your job will start bombing because it can't see your home directory any more. If there are other parts of the company's system which rely on freshly updated data from this job, they will break in new and interesting ways. All of this will happen on a day when nobody wants to work because it's some sort of holiday.

Also, since it's a longer-than-usual span of time off, it's entirely possible the person who does this will be on a boat somewhere, or off in the mountains, or otherwise out of communication range. Add to that a good chance of nobody even realizing that this job even exists, never mind that it has to run on a daily basis, and how it runs, and you have a nice recipe for disaster.

When it comes to issuing time-limited credentials, having them fiendishly short is probably better than having them be luxuriously long. The short ones will cause problems which surface relatively quickly, while the long ones will lurk for weeks or months until the right scenario comes up, and then it will strike.

Even if you get your employees trained by having them suffer at the hands of expiring credentials, your new hires might still get clever and give it a spin. They probably won't realize that a job which runs now while you're there might not run a day or two later after you've been gone for a while.

Don't assume policies will fix this sort of thing. You might be surprised just how many people exist who actually do "know better" and still do it anyway.