12/12/2008

Don’t Leave Stewardship to the Companies

Filed under: library, museum — ryan @ 11:29 am

Via the Powerhouse Museum blog comes the bad news that George Oates has been laid off from Flickr (along with a lot of other people laid off from Yahoo this week). George was the person in charge of Flickr’s much-publicized collaboration with the Library of Congress. That’s bad news for the projects George has been spearheading, but I doubt she will have any problem finding a new position, even in these tough economic times.

What this does spotlight, though, is what I feel has been some magical thinking on the part of the library and museum community regarding collaboration with corporate entities. Blinded by the wealth these companies seem to command, non-profit institutions forget that corporate dominance can be fleeting. In the short term, libraries and museums should definitely be experimenting with publicizing themselves through commercial services. But believing that commercial services like Flickr, or even Google Books, represent long-term solutions to fulfilling library and museum missions is a mistake. In the worst case, it may lead to the non-profit institutions being marginalized without providing any real long-term replacement.

Google may believe it will be around for 300 years. But in a year when we’ve seen some of the best-known and longest-surviving corporations disappear in a matter of days, we should treat such boasts as the ranting of a corporate Ozymandias.

10/21/2007

Libraries Look a Gift Horse in the Mouth

Filed under: books, future, library — ryan @ 9:51 pm

UC Berkeley I-School professor Paul Duguid is quoted in an article from tomorrow’s NYT about libraries rejecting Google’s digitization program in favor of working with the Internet Archive. The article focuses on Google’s clauses against allowing other commercial search engines to index the scans, but doesn’t mention another aspect of the deals which is worse: the OCR output of Google-scanned books isn’t made available to the participating libraries or to the public. Thus researchers who need digitized corpuses for developing information retrieval or natural language processing technology can’t make use of their own university libraries’ resources. This isn’t the case with books scanned by the Internet Archive, the OCR output of which are made available to everyone. Fortunately UC Berkeley is one of the libraries working with the Internet Archive’s scanning program, and the OCR output of those scans is proving to be very useful for my own research. As Clifford Lynch has written, providing access to library resources must go beyond simply making them available to human readers, toward making them available to be computed upon. Kudos to the libraries who are realizing this and choosing to work with the Internet Archive.

November 8, 2007 update: Some people have made the point that many of the library contracts that are publicly available specify that the libraries should receive OCR output. (Some of the the links on the Google Book Search Library Partners page lead to the pages that link to contracts, but you have to dig a bit.) So the contracts do mention OCR, but as I suspected they do not specify what the OCR output should consist of, because the libraries were thinking only of access to the digital files (i.e. people reading them), not computing on those files (i.e. machines processing them). Apparently (according to Peter Brantley) only UC had the foresight to think about that (and you can be sure that Google was thinking about it). So I stand by my assertion that the libraries that did not negotiate for the full OCR output made a mistake, and ceded a tremendous amount to Google.

9/19/2006

New version of amazon2melvyl

Filed under: books, library, tools — ryan @ 2:49 pm

I’ve posted a new version of my amazon2melvyl Greasemonkey script. This is a very minor change to handle Amazon’s new search-engine-optimized book links, which feature the book title in the URL. Click here to install, assuming you have Firefox and Greasemonkey already.

6/1/2005

amazon2melvyl Update

Filed under: library, tools — ryan @ 10:48 pm

I noticed that amazon2melvyl wasn’t handling some of the Amazon links at CiteULike, so I tweaked the link matching logic to better handle affiliate links.

Latest version of amazon2melvyl Greasemonkey user script

4/1/2005

amazon2melvyl Update

Filed under: library, tools — ryan @ 11:02 pm

I’ve posted an improved version of the amazon2melvyl script, which automagically adds Melvyl (UC Libraries Catalog) lookup links to linked Amazon items. This new version takes advantage of Greasemonkey’s new XMLHTTPRequest support to look up related ISBNs via OCLC’s xISBN web service, vastly increasing the chance of a successful lookup. It also uses in-line data: URIs for the icons, saving my bandwidth and your privacy (not that there was any snooping going on, but still). Thanks to Jeremy Dunck and Phil Ringnalda for these great suggestions. You can get the new version here.

2/27/2005

amazon2melvyl

Filed under: library, tools — ryan @ 5:08 pm

Yesterday I was perusing my Amazon wishlist, looking up the books I have bookmarked there on Melvyl, the combined UC Libraries catalog. The cutting and pasting was getting a little tedious, until I realized that Greasemonkey could come to my rescue. So I whipped up a little user script that looks for links to books at Amazon, and adds links to look up those books on Melvyl. You can get it here.

Powered by WordPress