Remote control and heavyweight authentication
After writing about ssh yesterday, I started thinking about alternatives, and the sorts of responses I was likely to get. Sure thing, within a few hours, a few people wrote in to basically tell me that I was nuts and that ssh would work just fine.
It's not black and white here. ssh can work. It can also not work. The exact requirements matter with this kind of thing. If you're doing one short-lived command on 20 machines, it'll probably work. If you're doing a long-lived command on 200 machines, it might not be that easy.
Fortunately, I also got some reports of people who have run into the same types of problems. One mentioned hitting a wall at merely 50 connections. That's harsh, and that's exactly what I was talking about.
If ssh isn't going to handle every situation, then there will be times when some other technique will be required. I'd start by thinking about what's really going on with your system. If what you actually want is specific chunks of data at regular intervals, perhaps it would be better to write something which could serve those up. Then it could serialize a message and fire it at you over the network. This is a great opportunity to avoid having to parse text output from tools like "df" or "uptime".
ssh provides both encryption and authentication, so any replacement needs to at least consider doing that in order to allow the same level of service. Now there's the matter of doing some kind of crypto, and doing it properly. Hopefully this means deferring to some proven library with a good track record instead of trying to roll your own. Still, there are some issues which can crop up with this technique.
For example, what if you intend to have a whole bunch of different services/daemons on a machine, and all of them need to do this auth stuff? Are they all going to link to the auth code? Will they all have to keep their own crypto state with random numbers and all of that? That sounds like a lot of overhead to me.
I also wonder about what would happen when the inevitable security upgrade comes down the pipe. Maybe your system was built without the ability to revoke a certificate because it never came up before. Then you fired someone for the first time and really need to make sure their old workstation can't shoot RPCs at your services.
If all of that crypto code happens by virtue of linking in a library and pushing out a bunch of new binaries, that's going to take a nontrivial amount of time to pick up an upgrade. Even if it's all API-compatible, just having to rebuild and repush everything might be a lot of work. If something in the interface changed and the clients need to be updated, forget about any sort of speedy rollout. It just won't happen.
I would assume the logical outcome would be some kind of daemon which runs on the local machine and keeps a crypto engine humming at all times. It might take connections over loopback or even a Unix domain socket. Client code would link in a library, sure, but all that library would do is call out to this long-lived crypto engine server to present some credentials and say "is this okay?".
Now imagine in this world if something needs to change in your crypto regime. You could just replace the crypto engine server thing on each machine instead of relinking and repushing all of the binaries for everything else. As long as the new crypto engine server can still talk to the existing crypto engine client library code which is already baked into all of those binaries, a lot of work should be avoidable.
I think of it like how the alcohol thing works at an "all ages" show at a venue which serves booze. Everyone gets in a line out front, and then the bouncers start checking tickets and IDs. They have good light and can make sure the picture on your ID matches your face, and they can look for obvious forgeries. Then they mark your hands appropriately - if you aren't 21 or can't prove it, or if you just don't want to drink, you get the big black X on your hands. Otherwise you might get their particular branded stamp and that's it.
Inside, the bartenders can look for the X or special stamp instead of attempting to do the whole ID check routine again. The light inside a rockin' club probably wouldn't allow for a good match of the photos and it takes time away from selling drinks. If the people inside can trust the marks, then they can apply the restrictions without having to actually check IDs right then and there.
Now imagine one day something changes, and the security regime needs another check added. Would you rather re-train your 20 bartenders and related drink-slinging personnel (hey, it's a big place!) or just the people at the door? As long as people still fit into the "booze OK" and "no booze" categories in the new scheme, the "clients" (bartenders) don't need to be "updated" (re-trained).
Again, this is one of those things where you need a large operation to really start feeling the pain. If your "club" (server farm) is just a handful of "people" (machines), then having everyone check IDs probably isn't a big deal.
You just can't expect the routines used by a cute little operation to scale infinitely. At some point, you have to step back and rethink it.