Information Access

My research on information access has included work on genre and genre identification. Here's a paper on this work that I wrote with Brett Kessler and Hinrich Schuetze. Here's a demo of Northrop, a system that does automatic genre classification that we developed at Xerox PARC.


Hinrich Schuetze and I have worked on analyzing patterns of language use and multilingualism on the Web.  Here is a paper that reports the results of some of that research, which was read at a conference on La politique de la langue et la formation des nations modernes at the Centre d'Etudes et Recherches Internationales in Paris on October 3, 1998.

Punctuation and Text Structure

I wrote a short book on the linguistics of punctuation several years ago; since then there has been a lot of interesting work in this area by, among others, Bernie Jones, Bilge Say and Varol Akman, Robin Hill, Robert Dale, Ted Briscoe and John Carroll, and Christy Doran. (See also the papers from the 1996 ACL Workshop on Punctuation in Computational Linguistics.)

With Ted Briscoe and Rodney Huddleston, I wrote a chapter on punctuation for the new -- and terrific -- Cambridge Grammar of English.

A popular piece on this work from the New York Times.

