7753 Ryan Shaw » 2007 » October

10/21/2007

Libraries Look a Gift Horse in the Mouth

Filed under: books, future, library — ryan @ 9:51 pm

UC Berkeley I-School professor Paul Duguid is quoted in an article from tomorrow’s NYT about libraries rejecting Google’s digitization program in favor of working with the Internet Archive. The article focuses on Google’s clauses against allowing other commercial search engines to index the scans, but doesn’t mention another aspect of the deals which is worse: the OCR output of Google-scanned books isn’t made available to the participating libraries or to the public. Thus researchers who need digitized corpuses for developing information retrieval or natural language processing technology can’t make use of their own university libraries’ resources. This isn’t the case with books scanned by the Internet Archive, the OCR output of which are made available to everyone. Fortunately UC Berkeley is one of the libraries working with the Internet Archive’s scanning program, and the OCR output of those scans is proving to be very useful for my own research. As Clifford Lynch has written, providing access to library resources must go beyond simply making them available to human readers, toward making them available to be computed upon. Kudos to the libraries who are realizing this and choosing to work with the Internet Archive.

November 8, 2007 update: Some people have made the point that many of the library contracts that are publicly available specify that the libraries should receive OCR output. (Some of the the links on the Google Book Search Library Partners page lead to the pages that link to contracts, but you have to dig a bit.) So the contracts do mention OCR, but as I suspected they do not specify what the OCR output should consist of, because the libraries were thinking only of access to the digital files (i.e. people reading them), not computing on those files (i.e. machines processing them). Apparently (according to Peter Brantley) only UC had the foresight to think about that (and you can be sure that Google was thinking about it). So I stand by my assertion that the libraries that did not negotiate for the full OCR output made a mistake, and ceded a tremendous amount to Google.

10/5/2007

Continuous City

Filed under: berkeley, newmedia, video — ryan @ 10:55 pm

I just got back from the premiere of Continuous City, a theater production being workshopped at UC Berkeley by Marianne Weems of The Builders Association. I helped create the website at which you can (via webcam) perform scenes and choruses that will then be incorporated into the show. But this was my first time seeing the offline portion of the production, and I was really impressed. So if you’re in the Bay Area, I highly recommend you go check it out sometime before the last show on October 14th. If you’re interested at all in networked culture, or even if you’re not and sick of the hype, you’ll find it very entertaining. If you’re not in the Bay Area, or you can’t make it to Berkeley in the next week and a half, you can see the completed version when it goes on tour over the next few years. (The software I’ve been building for the site will have matured by then too; right now it’s rather early beta–we started writing code about 6 weeks ago and some of the seams are definitely still showing. I’ll post something geeky about the process of creating the site later in the month.)

Powered by WordPress

0