Writing

Software, technology, sysadmin war stories, and more. Feed
Sunday, April 15, 2018

Feedback: ebooks, non-NFS systems, and code ownership

Yep, it's time for more responses to reader feedback.

I've gotten a few questions about my books, and different formats. Here's how that's worked out historically. First, the books start out as a giant HTML file, a bit like the Atom feed. The individual parts link to all of the images and each other just like they do here on the web. Then I ran the Kindle generator and that turned it into a massive .mobi file. That's what got uploaded to Amazon and should eventually be what you download. (I have not compared my original to downloaded versions, however.)

At some point, people started asking about the Barnes and Noble Nook, and that was easy enough to do, so I set up an account out there, used Calibre to turn the mobi files into epubs, and uploaded them there. This turned out to be more trouble than it was worth, though: hardly anyone bought the books that way, and B&N was pretty annoying to deal with. Eventually, I just took down the books from their store and called it done.

Somewhere along the way, I set up both a PayPal method to get the books in either format. All it did was send me a notice that someone had ordered something or other, and then I'd manually fulfill it by sending an e-mail back to the buyer's address with an attachment. It worked well enough, but at some point I decided I wanted to not use PayPal any more, and so retired those links.

At yet another point, there was a "store" here that used Stripe to do credit card clearing. That did a small amount of business and actually resulted in slightly more money per book (for the same price to customers) because there wasn't Amazon taking a 30% or even 70% chunk depending on which locale was involved. (Yes, really, the cut is 70% sometimes.)

That also went away when I migrated servers and didn't feel like getting all of the dynamic content stuff up and running again. It happened in the middle of what was a pretty high-impact job, and I didn't have much energy to put into something that wasn't being used very much in the first place.

This is about where we are now: Amazon still has the books in their .mobi format, without DRM, as they always have. The other methods have come and gone. I guess the Stripe-backed store could live again, but there's doesn't seem to be that much demand for it. Am I wrong? Let me know.

...

"If not NFS, then what?" -- that's a question I've gotten in many forms in recent times after talking about problems with NFS like SIGBUS. Obviously, people need ways to schlep data around over the network, and want to know my advice on what to do.

I find that these solutions tend to be closely associated with the problem domain, and that you can't try to solve all of them with a single technology. For example, let's say you have a system which is making recordings off a radio, and which then needs to submit them to the storage system, so they can later be served up over the web. The recorder machine is physically close to the transmitters, and the storage/web systems are not. There's a substantial distance between them, with network links that are long, slow, and lossy at times. Sometimes they go down.

This is not a time for NFS. Sure, you could figure a way to export a filesystem from the storage machine, mount it from the recorder machine, and just drop files on there, but why would you do that? You're just asking for all sorts of trouble.

My preference would be to first get some kind of encrypted tunnel/VPN up between the sites, so they aren't just talking over the Internet in the clear. Then, run some kind of dumb "submission" server on the storage host, and have it accept something approximating RPCs. The client would run on the machine with the radio, and would just push calls to it in order.

In the event the network went down, calls would just queue up on the recorder host's local storage. Obviously you can't sustain this forever, but having it be able to survive for several hours with no connectivity would be rather important.

At either end, it probably would use actual filesystem semantics. Things like MP3 call audio files tend to work well that way. However, just because it's in a filesystem at either end doesn't mean that you necessarily want to fling it over the network like that!

A similar pattern might apply for gathering logs from a bunch of worker nodes. Logs go to the local disk for a little while, and then roll over to another filename. When that happens, a helper process wakes up and tries to push it to permanent storage, again with some kind of RPC method. It could be a glorified HTTP POST if you're so inclined. Then it expires the logs from the local host. Lather, rinse, repeat.

Another way to look at it is like this: if you give your programmers the impression that they can rely on POSIX filesystem behaviors, they will build a system which depends on them. NFS is one of several ways to make this happen. If, on the other hand, you treat storage and retrieval like network calls which can (and will) fail, you have a better chance of having them build resilient systems.

Put it another way. What's more likely to be handled properly: a HTTP POST that times out, or a fprintf() that blocks effectively forever?

...

I talked about having my creations be "mine again by default" now that I'm out of a conventional gig. Some folks have asked about the implications of that. This is really messy, and everyone needs to educate themselves about what may apply to their situations.

First of all, laws vary from state to state, and obviously country to country. If you're in California, you probably have a better situation as a worker than if you're somewhere else in the US, for instance.

Next, if you're working on the company's equipment or "on their time" (whatever that means, given non-hourly workers), worry. That's the kind of stuff they'll use to make a case for owning something you created. It seems obvious when you say it out loud, but some people don't really think about it.

You might have a company phone or laptop. Does that phone have a data plan? Did you use that data plan to look stuff up for the side project? Did you call your cofounder of the side gig? Did you start researching something for that project while you were at the office?

Then there's the whole "company's line of work" thing. Maybe your company makes a certain type of widget. You start looking at how to make very similar widgets for yourself. They can use that against you, too. The really insidious part of this is that so many of these companies are fanning out into other markets.

If you worked at Google at one point, competing with them probably meant doing web search. Then it meant Usenet archives. Then it added any form of online advertising, as they gained the ability to inject ads all over the web. Within ten years, there was photo sharing, bad attempts at social networking, "checkins" to restaurants and other places, self-driving cars, wacky software-defined cell networks, and the list goes on and on. How could you not compete with one of their lines of business?

Other companies are growing in similar ways. Yours might be next.

For this and many other reasons, I tended to keep my ideas squashed down to nothingness while working that "real job". The things which did progress in terms of coding were stuff like the scanner and a few other projects of mine. All of those were things I put on the list of "pre-existing inventions" when I started working there. Even then, I didn't touch any of it from work equipment.

When in doubt, track down an actual lawyer, naturally!