10/21/2007

Libraries Look a Gift Horse in the Mouth

Filed under: books, future, library — ryan @ 9:51 pm

UC Berkeley I-School professor Paul Duguid is quoted in an article from tomorrow’s NYT about libraries rejecting Google’s digitization program in favor of working with the Internet Archive. The article focuses on Google’s clauses against allowing other commercial search engines to index the scans, but doesn’t mention another aspect of the deals which is worse: the OCR output of Google-scanned books isn’t made available to the participating libraries or to the public. Thus researchers who need digitized corpuses for developing information retrieval or natural language processing technology can’t make use of their own university libraries’ resources. This isn’t the case with books scanned by the Internet Archive, the OCR output of which are made available to everyone. Fortunately UC Berkeley is one of the libraries working with the Internet Archive’s scanning program, and the OCR output of those scans is proving to be very useful for my own research. As Clifford Lynch has written, providing access to library resources must go beyond simply making them available to human readers, toward making them available to be computed upon. Kudos to the libraries who are realizing this and choosing to work with the Internet Archive.

November 8, 2007 update: Some people have made the point that many of the library contracts that are publicly available specify that the libraries should receive OCR output. (Some of the the links on the Google Book Search Library Partners page lead to the pages that link to contracts, but you have to dig a bit.) So the contracts do mention OCR, but as I suspected they do not specify what the OCR output should consist of, because the libraries were thinking only of access to the digital files (i.e. people reading them), not computing on those files (i.e. machines processing them). Apparently (according to Peter Brantley) only UC had the foresight to think about that (and you can be sure that Google was thinking about it). So I stand by my assertion that the libraries that did not negotiate for the full OCR output made a mistake, and ceded a tremendous amount to Google.

2/18/2005

Greasemonkey Stole Your Job (and Your Business Model)

Filed under: future, webservices — ryan @ 12:22 am

I spent some time tonight playing around with Greasemonkey, and it pretty much blew my mind. What is it? Well, basically it is a platform for running scripts that inject new functionality into web interfaces. If you’re a UI designer, this might frighten you. What it means is that any kid with a bright idea and a knack for DHTML can create a new interface for your site, and it will probably be better than yours. (There’s a lot of bright kids out there in the world.) Why should you get paid when the bright kids will do your job better for free?

The key to survival will be going meta: design for the bright kids. Create a flexible, modular set of APIs and a well-documented example UI or two that shows how they are used. Learn from Amazon and release your grip on the end-user experience.

But developments like Greasemonkey disrupt more than just job descriptions: they disrupt business models too. For example, I will never see a Google AdSense ad again, thanks to a handy Greasemonkey script.

Will browser customizations like this play TiVo to to Google and Yahoo’s advertiser-supported businesses? Will Google and Yahoo respond like the entertainment industry did? Or will they beat the bright kids at their own game? Some predictions: some future version of a Google or Yahoo toolbar will re-inject any of their advertising that has been removed; uninstalling the toolbar will result in the loss of valuable functionality without which users of their services will be considerably impoverished; meanwhile the APIs for these services will grow ever more closely guarded.

Powered by WordPress