Writing

Feed Software, technology, sysadmin war stories, and more.

Monday, June 25, 2012

Index that chaotic intranet instead of strangling it

No matter how much the pointy-haired types may scream and holler, useful intranets rarely resemble carefully pruned trees. They may want everything to start from a shared root with nice categories and proper gatekeepers who vet content, but that never lasts. Either it grows out organically and turns into more of a hedge (or shrubbery ...), or it becomes cold, lifeless and useless.

I've seen this happen at lots of places, both from the inside and just as a customer. They have a bunch of internal web pages which sprout up organically as people start solving problems. Then someone comes along and decides to "unite the tribes" and tries to bring all of it into their own system. This is their way of consolidating power, naturally.

There are always plenty of reasons given for this kind of thing. One of them is just the flat-out assertion without proof that "centralized is better". That may or may not be the case, but they never explain how or why, so it's just another empty calorie statement from management.

One reason that actually has legs is that "it's too hard to find useful information in a chaotic system". That's true, but it doesn't have to be the end of the world. Think about it. For around 20 years now, people have been putting up gopher and then web resources at any old path they want. Things move around all the time. They change paths, extensions, filenames, hostnames, domain names, all of this.

Still, though, people manage to find their way around through the help of web crawling robots and search engines with their mighty indexes.

This is about the point where the people who make "intranet" flavored version of those search engine appliances step in. The company goes off and buys one, and deploys it, and invariably discovers that it finds the stuff on the "official" intranet page, but it's missing stuff. It turns out to miss all of the really good organically-grown content which is on random systems all over the place.

Personally, I think this problem is because the walled garden of web pages inside a corporate network is usually too small to organically link to things. You usually don't have an internal equivalent of Twitter, or Facebook, or reddit, or anything else to share links around. More often than not, these URLs are passed around from person to person and become part of a team's "Tribal Song of the Ancestors".

There is one thing that usually happens with these links, though: people tend to bookmark them. Think about it. If you work in a company which has lots of crazy things buried all over the place, and you had trouble finding them, you're going to save them so you can find them later, right?

It might just be that simple. Get permission to start scraping the bookmarks of employees for internal links, be they simple http://hostname/ type links, or http://foo.internal.corp.example.com/ stuff. Use them as starting points for crawls and see what you find. Odds are, the really good things will have a bunch of people linking to them from their personal bookmarks lists. Assign scores and rank the results accordingly.

I came up with this idea ages ago based on knowing how my fellow employees tended to operate in a corporate network which had dozens or hundreds of little web "outposts". When I mentioned this idea to someone who had just set up one of these appliances, they looked at me and just stared. Apparently this had never occurred to them.

I haven't had a reason to run one of these things in recent times, so I never found out if they ever acted on this idea. Given how much time has passed, you'd think it would be a done deal by now.

However, given the realities of corporate information management types, they're probably starting yet another Grand Unified Documentation Portal which will get part way done and then stall out. Some links will be updated, others point into older systems, and some are just "coming soon" forever.

It is rumored that you can tell the age of a company by counting the number of half-baked intranet documentation systems. It's the corporate network equivalent of counting rings on a tree.