David BammanAssistant Professor
School of Information
University of California, Berkeley Affiliated appointment, EECS
Faculty, Berkeley AI Research Lab (BAIR)
Senior Fellow, Berkeley Institute for Data Science (BIDS)
email: dbamman at berkeley.edu Summer 2020 office hours: by appointment
I'm an assistant professor in the School of Information at UC Berkeley (with an affiliated appointment in EECS), where I work on applying natural language processing and machine learning to empirical questions in the humanities and social sciences. My research often involves adding linguistic structure (e.g., syntax, semantics, coreference) to statistical models of text. As such, I'm especially interested in developing core NLP techniques for a variety of languages and domains (e.g., literary text, social media). Before Berkeley, I received my PhD at Carnegie Mellon (School of Computer Science, Language Technologies Institute) and was a senior researcher at the Perseus Project of Tufts University.
- Matt Sims, postdoc
- Jon Gillick, PhD student
- Lucy Li, PhD student
- Masha Belyi, MIMS
- Olivia Lewke, undergraduate
- Anya Mansoor, undergraduate
- Aubrey Williams, undergraduate
- Fall 2020
- Computational Humanities (Info 190/COMLIT 170)
- Spring 2020
- Natural Language Processing (Info 159/259)
- Fall 2019
- Information Organization and Retrieval (Info 202)
- Spring 2019
- Applied Natural Language Processing (Info 256)
- Fall 2018
- Fall 2017
- Spring 2017:
- Deconstructing Data Science (Info 290)
- Fall 2016:
- Spring 2016:
- Deconstructing Data Science (Info 290)
- Matt Sims and David Bamman (2020), "Measuring Information Propagation in Literary Social Networks," preprint [pdf].
- David Bamman (2020), "Born-Literary Natural Language Processing," Debates in Digital Humanities, [preprint].
- David Bamman, Olivia Lewke and Anya Mansoor (2020), "An Annotated Dataset of Coreference in English Literature," LREC 2020 [pdf].
- Matt Sims, Jong Ho Park and David Bamman (2019), "Literary Event Detection," ACL 2019 [pdf].
- Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck and David Bamman (2019), "Learning to Groove with Inverse Sequence Transformations," ICML 2019 [pdf].
- Jon Gillick, Carmine-Emanuele Cella and David Bamman, "Estimating Unobserved Audio Features for Target-Based Orchestration," ISMIR 2019 [pdf].
- Jon Gillick and David Bamman (2019), "Breaking Speech Recognizers to Imagine Lyrics," NeurIPS Workshop on Machine Learning for Creativity and Design [pdf].
- David Bamman, Sejal Popat and Sheng Shen (2019), "An Annotated Dataset of Literary Entities," NAACL 2019 [pdf].
- Jon Gillick and David Bamman (2018), "Please Clap: Modeling Applause in Campaign Speeches," NAACL 2018 [pdf].
- Jon Gillick and David Bamman (2018), "Telling Stories with Soundtracks: An Empirical Analysis of Music in Film," NAACL 2018 Storytelling Workshop [pdf].
- Ted Underwood, David Bamman, and Sabrina Lee (2018), "The Transformation of Gender in English-Language Fiction," Cultural Analytics [pdf].
- Kimiko Ryokai, Elena Durán López, Noura Howell, Jon Gillick, and David Bamman (2018), "Capturing, Representing, and Interacting with Laughter," CHI 2018 [pdf].
- Lara McConnaughey, Jennifer Dai and David Bamman (2017), "The Labeled Segmentation of Printed Books," EMNLP 2017 [pdf].
- Yi Wu, David Bamman and Stuart Russell (2017), "Adversarial Training for Relation Extraction," EMNLP 2017 [pdf].
- David Bamman, Michelle Carney, Jon Gillick, Cody Hennesy, and Vijitha Sridhar (2017), "Estimating the Date of First Publication in a Large-Scale Digital Library," JCDL 2017 [pdf]
- David Bamman (2017), "Natural Language Processing for the Long Tail," Digital Humanities 2017 [pdf]
- Smitha Milli and David Bamman (2016), "Beyond Canonical Texts: A Computational Analysis of Fanfiction," EMNLP 2016 [pdf]
- David Bamman (2016), "Interpretability in Human-Centered Data Science," CSCW Workshop on Human-Centered Data Science [pdf]
- David Bamman and Noah Smith, "Open Extraction of Fine-Grained Political Statements," EMNLP 2015. [pdf]
- David Bamman and Noah Smith, "Contextualized Sarcasm Detection on Twitter," ICWSM 2015. [pdf] [bib]
- David Bamman and Noah Smith, "Unsupervised Discovery of Biographical Structure in Text," Transactions of the ACL (October 2014). [pdf] [synopsis] [bib]
- David Bamman, Jacob Eisenstein and Tyler Schnoebelen, "Gender Identity and Lexical Variation in Social Media," Journal of Sociolinguistics 18.2 (2014). [article] [preprint] [bib]
- David Bamman, Ted Underwood and Noah Smith, "A Bayesian Mixed Effects Model of Literary Character," ACL 2014. [pdf] [synopsis] [bib]
- David Bamman, Chris Dyer and Noah Smith, "Distributed Representations of Geographically Situated Language," ACL 2014. [pdf] [synopsis] [bib]
- David Bamman, Brendan O'Connor and Noah Smith, "Learning Latent Personas of Film Characters," ACL 2013. [pdf] [data] [code] [bib]
- David Bamman, Adam Anderson, and Noah Smith, "Inferring Social Rank in an Old Assyrian Trade Network," Digital Humanities (2013) [ArXiv]
- Schneider, Nathan, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Jason Baldridge, Noah A. Smith, and Chris Dyer, "A Framework for (Under)specifying Dependency Syntax without Overloading Annotators," In Proceedings of the ACL Linguistic Annotation Workshop (LAW 2013), Sofia, Bulgaria, August 2013. [Extended version]
- David Bamman, Brendan O'Connor and Noah A. Smith, "Censorship and Deletion Practices in Chinese Social Media," First Monday 17.3 (March 2012). [html] [bib]
- O'Connor, Brendan, David Bamman and Noah A. Smith, "Computational Text Analysis for Social Science: Model Assumptions and Complexity," NIPS Workshop on Computational Social Science and the Wisdom of Crowds (2011). [pdf] [bib]
- David Bamman, and Gregory Crane, "Measuring Historical Word Sense Variation," in: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011). Runner up, Best Paper Award. [pdf] [bib]
- David Bamman, and Gregory Crane, "The Ancient Greek and Latin Dependency Treebanks," in: Caroline Sporleder, Antal van den Bosch and Kalliopi Zervanou (eds.), Language Technology for Cultural Heritage (Springer, 2011). [pdf] [bib]
- David Bamman, "Mapping the Demographics of American English with Twitter," Language Log, May 18, 2010. [html]
- David Bamman, Alison Babeu, and Gregory Crane, "Transferring Structural Markup Across Translations Using Multilingual Alignment and Projection," in: Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2010). Winner, Best Paper Award. [pdf] [bib]
- David Bamman, Francesco Mambrini and Gregory Crane, "An Ownership Model of Annotation: The Ancient Greek Dependency Treebank," in: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) (Milan, Italy: 2009). [pdf] [bib]
- David Bamman, and Gregory Crane, "Computational Linguistics and Classical Lexicography," Digital Humanities Quarterly 3.1 (2009). [html] [bib]
- David Bamman, Marco Passarotti and Gregory Crane, "A Case Study in Treebank Collaboration and Comparison: Accusativus cum Infinitivo and Subordination in Latin," Prague Bulletin of Mathematical Linguistics 90 (2008). [pdf] [bib]
- David Bamman and Gregory Crane, "The Logic and Discovery of Textual Allusion," in: Proceedings of the 2008 LREC Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). [pdf] [bib]
- David Bamman and Gregory Crane, "Building a Dynamic Lexicon from a Digital Library," in: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008). [pdf] [bib]
11K Latin Books. 11,261 OCR'd Latin texts from the Internet Archive (1.38B words), along with associated metadata detailing the dates of composition.
CMU Book Summary Dataset. 16,559 book plot summaries + metadata.
CMU Movie Summary Dataset. 42,306 movie plot summaries + metadata
Twitter14K Dataset. Aggregated word counts from 14,464 Twitter users (9.2M tweets)