Postdoc: NLP for computational literary analysis

About the project:

Literary novels push the limits of natural language processing. While much work in NLP has been heavily optimized toward the narrow domain of contemporary English newswire (and now increasingly social media), literary novels are an entirely different animal—the long, complex sentences in novels strain the limits of syntactic parsers with super-linear computational complexity, their use of figurative language challenges representations of meaning based on neo-Davidsonian semantics, and their long length (ca. 100,000 words on average) rules out existing solutions for problems like coreference resolution that expect a small set of candidate antecedents.

At the same time, fiction drives computational research questions that are uniquely interesting to that domain, and this interest often spills out elsewhere in unexpected ways. The task of authorship attribution was first proposed by Mendenhall (1887) to discriminate the works of Francis Bacon, Shakespeare and Christopher Marlowe, which later drove the pioneering work on the Federalist Papers by Mosteller and Wallace (1964) and now is used in applications as far removed as forensic analysis. Current active areas of literary NLP research include extracting social networks from novels (Elson et al., 2010) learning representations of character relationships (Iyyer et al., 2016), quote attribution (Muzny et al., 2017), and learning to infer readers' attitudes to the stories they read (Milli and Bamman, 2016).

In this project, we will focus on developing computational models of a uniquely literary problem: plot. We will set out to develop and improve the fundamental applications in natural language processing that help make a realistic computational model of plot possible; while "plot" itself is a complex abstraction, one contribution of our work here is to decompose it into solvable sub-problems, each of which can be researched and evaluated on its own terms. We will focus in this work on the atomic elements: at the very least, plot involves people (characters), places (the setting where action takes place), time (when those actions take place), and things (objects that are important), all interacting through depicted events (in the form of actions, not descriptions). Each of these atomic elements entails individual sub-problems in NLP; some of these exist as formal problems (named entity recognition, character clustering, temporal information processing), while others do not yet.

In this role, you carry out primary research in this general area, work with graduate students and supervise collaborative teams of undergraduates in computer science, data science, and English literature. This is a one-year position, beginning anytime before September 1, 2018.

About you:

Qualified candidates may come either from a technical field (computer science, information science, statistics) with a track record of publishing in NLP conferences (e.g., ACL, EMNLP), or from a humanities field with a strong focus on computational text analysis and the use of empirical methods in the exploration of literature.

To apply:

To apply, send a CV, cover letter, links to two writing samples, and the names and contact information for three references familiar with your work to David Bamman (dbamman@berkeley.edu). Applications will be reviewed on a rolling basis.

Acknowledgments:

Many thanks for an Amazon Research Award for supporting this work.