Structured Yet Permeable
Daniel Tunkelang has a worthwhile set of posts [1, 2, 3] on whether Google is “good enough” for various kinds of tasks that involve document retrieval. The last one, on enterprise search, got me to thinking about what it is that interests me about scholarly information systems. Tunkelang, riffing on an article by Chris Sherman, argues that “enterprises, with all of their highly structured and carefully organized silos of information, require a very different and paradoxically more complex approach” to search than what Google does with Web documents.
Scholars too have existing “highly structured and carefully organized silos of information,” and I’m very interested in how to reconcile these, and the organizational processes that produce them, with the new tools that things like statistical machine learning make possible. Yet the scholarly domain is even more interesting than the enterprise domain, because its silos exist not only within enterprise-like organizations like universities, but also in the invisible colleges formed by colleagues who share a discipline but work within different organizations. Engineers have similar cross-cutting community affiliations: one might identify more strongly as a Python programmer and member of the Python community than as an employee of any particular tech company. Although the latter is the one paying the bills, the former is where questions are answered, contacts are made, and new jobs are found.
Anyway, the point is that while organizing information within a company or a university is an interesting problem, even more interesting is the problem of how to interweave these kinds of information systems with those of other, more fluid, disciplinary or interest-driven communities. Another way of thinking about it is: how can a complex organization not only articulate and fulfill its own information needs, but also be permeable enough to inter-operate with other kinds of organizations as needed in order to articulate and fulfill the needs of different trans-organizational configurations of users, as (for example) new disciplines are formed or strategic alliances made? Can we have the openness and flexibility of the Web without dismissing organizations or resigning ourselves to disorganization?
August 12th, 2008 at 7:46 pm
Ryan, thanks for the links! And I wholeheartedly agree that the problem of accessing heterogeneous collections of information is hardly restricted to company intranets. Unfortunately “enterprise search” is an overloaded term; I use it broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the partially adversarial relationship that web search companies have with their content creators).
August 12th, 2008 at 8:31 pm
Thanks for the clarification. I confess that I usually think of corporate intranets when I hear “enterprise search.” Of course “partially adversarial” relationships aren’t restricted to web search: consider the complex relations between academic authors, publishers, and university repositories involved in providing access to scholarly work, especially as the scope of information systems is widened to include not just published papers but intermediate products such as data sets. Everyone involved wants some ownership or control of the content (though often for different reasons) and there is also a strong argument for certain levels of public access. Plus, it’s increasingly difficult to draw hard distinctions between these scholarly information systems and the web, as authors post their papers to their blogs and use open wikis for collaboration, publishers open their databases to be indexed by Google, etc. So maybe rather than a web search/enterprise search dichotomy what we have is a spectrum of information access scenarios that involve varying mixtures of ownership and control, often spread among various parties.
August 13th, 2008 at 4:55 pm
That’s a fair point. I just blogged about a conversation I had today with Intelligent Enterprise columnist Seth Grimes that made it clear how difficult it is to pin down a definition for “enterprise search”. I personally like your suggestion of talking about a spectrum of information access scenarios, and you’ll see that at Endeca we do position our technology as addressing this spectrum. At the same time, businesses and analysts have preconceptions about enterprise search, and it’s imperative to address those preconceptions in order to get beyond them.