David BammanAssistant Professor
School of Information
University of California, Berkeley twitter: @dbamman
email: dbamman at berkeley.edu fall 2017 office hours:
314 South Hall
(stop by and chat!) CV
I'm an assistant professor in the School of Information at UC Berkeley, where I work on applying natural language processing and machine learning to empirical questions in the humanities and social sciences. My research often involves adding linguistic structure (e.g., syntax, semantics, coreference) to statistical models of text. As such, I'm especially interested in developing core NLP techniques for a variety of languages and domains (e.g., literary text, social media). Before Berkeley, I received my PhD at Carnegie Mellon (School of Computer Science, Language Technologies Institute) and was a senior researcher at the Perseus Project of Tufts University.
- The NEH is funding our work on using visual information in the computational analysis of books ($325,000, co-PI with CMU)
- I'm accepting PhD students for Fall 2018. Apply here.
- Undergraduate research opportunities on the "computational humanities" for AY 2017-2018. Apply here.
- Fall 2017
- Spring 2017:
- Deconstructing Data Science (Info 290)
- Fall 2016:
- Spring 2016:
- Deconstructing Data Science (Info 290)
- Lara McConnaughey, Jennifer Dai and David Bamman (2017), "The Labeled Segmentation of Printed Books," EMNLP 2017 [pdf].
- Yi Wu, David Bamman and Stuart Russell (2017), "Adversarial Training for Relation Extraction," EMNLP 2017 (forthcoming).
- David Bamman, Michelle Carney, Jon Gillick, Cody Hennesy, and Vijitha Sridhar (2017), "Estimating the Date of First Publication in a Large-Scale Digital Library," JCDL 2017 [pdf]
- David Bamman (2017), "Natural Language Processing for the Long Tail," Digital Humanities 2017 [pdf]
- Smitha Milli and David Bamman (2016), "Beyond Canonical Texts: A Computational Analysis of Fanfiction," EMNLP 2016 [pdf]
- David Bamman (2016), "Interpretability in Human-Centered Data Science," CSCW Workshop on Human-Centered Data Science [pdf]
- David Bamman and Noah Smith, "Open Extraction of Fine-Grained Political Statements," EMNLP 2015. [pdf]
- David Bamman and Noah Smith, "Contextualized Sarcasm Detection on Twitter," ICWSM 2015. [pdf] [bib]
- David Bamman and Noah Smith, "Unsupervised Discovery of Biographical Structure in Text," Transactions of the ACL (October 2014). [pdf] [synopsis] [bib]
- David Bamman, Jacob Eisenstein and Tyler Schnoebelen, "Gender Identity and Lexical Variation in Social Media," Journal of Sociolinguistics 18.2 (2014). [article] [preprint] [bib]
- David Bamman, Ted Underwood and Noah Smith, "A Bayesian Mixed Effects Model of Literary Character," ACL 2014. [pdf] [synopsis] [bib]
- David Bamman, Chris Dyer and Noah Smith, "Distributed Representations of Geographically Situated Language," ACL 2014. [pdf] [synopsis] [bib]
- David Bamman, Brendan O'Connor and Noah Smith, "Learning Latent Personas of Film Characters," ACL 2013. [pdf] [data] [code] [bib]
- David Bamman, Adam Anderson, and Noah Smith, "Inferring Social Rank in an Old Assyrian Trade Network," Digital Humanities (2013) [ArXiv]
- Schneider, Nathan, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Jason Baldridge, Noah A. Smith, and Chris Dyer, "A Framework for (Under)specifying Dependency Syntax without Overloading Annotators," In Proceedings of the ACL Linguistic Annotation Workshop (LAW 2013), Sofia, Bulgaria, August 2013. [Extended version]
- David Bamman, Brendan O'Connor and Noah A. Smith, "Censorship and Deletion Practices in Chinese Social Media," First Monday 17.3 (March 2012). [html] [bib]
- O'Connor, Brendan, David Bamman and Noah A. Smith, "Computational Text Analysis for Social Science: Model Assumptions and Complexity," NIPS Workshop on Computational Social Science and the Wisdom of Crowds (2011). [pdf] [bib]
- David Bamman, and Gregory Crane, "Measuring Historical Word Sense Variation," in: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011). Runner up, Best Paper Award. [pdf] [bib]
- David Bamman, and Gregory Crane, "The Ancient Greek and Latin Dependency Treebanks," in: Caroline Sporleder, Antal van den Bosch and Kalliopi Zervanou (eds.), Language Technology for Cultural Heritage (Springer, 2011). [pdf] [bib]
- David Bamman, "Mapping the Demographics of American English with Twitter," Language Log, May 18, 2010. [html]
- David Bamman, Alison Babeu, and Gregory Crane, "Transferring Structural Markup Across Translations Using Multilingual Alignment and Projection," in: Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2010). Winner, Best Paper Award. [pdf] [bib]
- David Bamman, Francesco Mambrini and Gregory Crane, "An Ownership Model of Annotation: The Ancient Greek Dependency Treebank," in: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) (Milan, Italy: 2009). [pdf] [bib]
- David Bamman, and Gregory Crane, "Computational Linguistics and Classical Lexicography," Digital Humanities Quarterly 3.1 (2009). [html] [bib]
- David Bamman, Marco Passarotti and Gregory Crane, "A Case Study in Treebank Collaboration and Comparison: Accusativus cum Infinitivo and Subordination in Latin," Prague Bulletin of Mathematical Linguistics 90 (2008). [pdf] [bib]
- David Bamman and Gregory Crane, "The Logic and Discovery of Textual Allusion," in: Proceedings of the 2008 LREC Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). [pdf] [bib]
- David Bamman and Gregory Crane, "Building a Dynamic Lexicon from a Digital Library," in: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008). [pdf] [bib]
11K Latin Books. 11,261 OCR'd Latin texts from the Internet Archive (1.38B words), along with associated metadata detailing the dates of composition.
CMU Book Summary Dataset. 16,559 book plot summaries + metadata.
CMU Movie Summary Dataset. 42,306 movie plot summaries + metadata
Twitter14K Dataset. Aggregated word counts from 14,464 Twitter users (9.2M tweets)