Writing

Feed Software, technology, sysadmin war stories, and more.

Wednesday, June 27, 2012

Extending the web's baseline

At what point do common JavaScript libraries and CSS packages become part of the "baseline" web? There are other things which started out as not being supported by browsers and which were handled by external helpers. Now they are handled internally. I wonder about that for some of the things we see today.

Think about how many pages are using jQuery in some way. I certainly do, although not here on the "/w/" pages. A bunch of my projects which need "app" type behavior call on it. I've done JS by hand back in the long-ago, and it was enough to make me hate it. Now I can be productive without becoming miserable. This is good!

The problem with this is that everyone winds up having to figure out hosting for their helper libraries, even though there aren't that many out there. There might be a huge number of web sites, call it M, and a much smaller number of common libraries, call them N, but I bet the number of unique URLs for those libraries is closer to M than N.

So this is where someone will pop up and say "what about community hosting?", and that's when I get a little nervous. Once you swing that SCRIPT SRC's URL over to someone else's hosting, be it the big G or someone else, you've just lost some control over your own destiny. If that site goes down, or starts behaving oddly, or otherwise stops working for you or your users, your site is broken. Worse, it may happen in such a way that you only find out about it when your users (finally) reach you and complain.

The flip side of this is hosting everything yourself. This lets you keep tight control on things, but then you have a whole new set of problems. New visitors to your pages have to fetch all of your resources, even though they probably have perfectly good cached copies from other sites on their machine. If their browser does any sort of clever local compilation or optimization tricks for scripts, it'll probably have to start all over since this is "new" content.

It seems like we're missing some tools in this effort here. One of them would be the ability to provide multiple URLs for the same resource. Maybe you reference jQuery-x.y.z.min.js primarily from http://big.cdn/js/blahblah, but you also give a pointer to http://my.site/js/blahblah too. Let the browser figure it out.

Still, there is one thing which worries me about the CDN situation, and that's what happens if someone edits that script? You've given it permission to run in the context of your page by SRCing it. Compromising that upstream file would allow lots of interesting attacks on unsuspecting web sites which are just trying to be helpful.

For that, the only thing I can think of is code signing of some sort. Maybe you say SCRIPT SRC= whatever, but you also say SUM=sha256:e3b0c44298f...49b934ca495991b7852b855 too. This would allow the browser to reject anything which had been modified.

There's more. Once you start thinking of these common scripts as being a small number of signatures, you might be able to start referring to them in some common way. Maybe the client's browser already has a script called jQuery with that sha256 sum loaded locally. It's already scanned it, compiled it, optimized it, or whatever else, and it's ready to go. It could just grab that from the local cache and keep going. It doesn't matter where it came from, since it's the same content... as long as you can trust your signing infrastructure.

Notice that I'm not calling for browsers to start preloading their own copies of popular libraries. That would be a huge amount of effort, and would probably start all sorts of stupid wars as people jockey for inclusion, sort of like they do with SSL roots now. Instead, the individual client instances should figure this out on the fly using data supplied by the web they encounter. Let it happen organically and in a decentralized fashion.

The best part of all of this is that you could (and should) rig it to be backwards-compatible. Maybe you SCRIPT SRC=http://your.site, but then you add more attributes which allow the browser to find it through other means. New browsers will use them and possibly benefit, and old ones will ignore it and use it as they do now. Everyone wins.

Remember when I mentioned old browsers back at the top? Think back to the days of Mosaic. Odds are, if you ran a sufficiently ancient version on Windows and encountered a JPEG, it would sit there for a bit and then kick off an image viewer. There were only a few types of images which could be handled "inline". Once it became obvious which ones were being used and which weren't, browsers adapted.

I'm just waiting to see the next adaptation of this sort. Lower latency and better content security for all!