David Bamman

Associate Professor
School of Information
University of California, Berkeley

Affiliated appointments: EECS, Linguistics, Computational Precision Health

Faculty, Berkeley AI Research Lab (BAIR); Senior Fellow, Berkeley Institute for Data Science (BIDS)

twitter: @dbamman
email: dbamman at berkeley.edu

Fall 2023 office hours: Mon 10-11:30 (312 SH), 11/20 + 11/27

David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman's work is supported by the National Endowment for the Humanities, National Science Foundation, an Amazon Research Award, and an NSF CAREER award.

I lead an amazing research group:

Lucy Li, PhD student
Kent Chang, PhD student
Nikita Mehandru, PhD student (co-advised with Steve Weber)
Naitian Zhou, PhD student

I'm open to advising new PhD students applying in Fall 2023 for admission next year; see here for information for prospective students.

Alumni

Sandeep Soni, Postdoc → Asst. Prof., Quantitative Theory and Methods, Emory University
Jon Gillick, PhD 2022 → Postdoc, Creative Computing Institute, University of the Arts London
Matt Sims, Postdoc → Sudowrite

Announcements

The NEH is funding work on "Multlingual BookNLP", expanding BookNLP to Spanish, German, Japanese and Russian

The NSF has awarded me a CAREER grant to work on improving NLP for contemporary fiction and mining fiction to improve real-world systems.

The NSF is funding our work "Building Subjective Knowledge Bases by Modeling Viewpoints." The project website can be found here: www.subjectivekb.org.

Teaching

Spring 2023
- Natural Language Processing (Info 159/259)
Fall 2022
- Information Organization and Retrieval (Info 202)
Spring 2022
- Natural Language Processing (Info 159/259)
Fall 2021
- Applied Natural Language Processing (Info 256)
Spring 2021
- Natural Language Processing (Info 159/259)
Fall 2020
- Computational Humanities (Info 190/COMLIT 170)
Spring 2020
- Natural Language Processing (Info 159/259)
Fall 2019
- Information Organization and Retrieval (Info 202)
Spring 2019
- Applied Natural Language Processing (Info 256)
Fall 2018
- Natural Language Processing (Info 159/259)
- Information Organization and Retrieval (Info 202)
Fall 2017
- Natural Language Processing (Info 159/259)
- Information Organization and Retrieval (Info 202)
Spring 2017:
- Deconstructing Data Science (Info 290)
Fall 2016:
- NLP Research Seminar (Info 290)
- Information Organization and Retrieval (Info 202)
Spring 2016:
- Deconstructing Data Science (Info 290)

Publications

Naitian Zhou, David Jurgens and David Bamman (2024), "Social Meme-ing: Measuring Linguistic Variation in Memes," NAACL [preprint].
Kent K. Chang, Mackenzie Cramer, Sandeep Soni, and David Bamman (2023), "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4," EMNLP [pdf].
Sandeep Soni, Amanpreet Sihra, Elizabeth F. Evans, Matthew Wilkens and David Bamman (2023), "Grounding Characters and Places in Narrative Text," ACL [pdf].
Kent K. Chang, Danica Chen and David Bamman (2023), "Dramatic Conversation Disentanglement," Findings of ACL [pdf].
Li Lucy, Jesse Dodge, David Bamman and Katherine A. Keith (2023), "Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications," Findings of ACL [pdf].
Li Lucy, Divya Tadimeti and David Bamman (2022), "Discovering Differences in the Representation of People Using Contextualized Semantic Axes," EMNLP 2022 [pdf].
Sandeep Soni, David Bamman and Jacob Eisenstein (2022), "Predicting Long-Term Citations from Short-Term Linguistic Influence," Findings of EMNLP 2022 [pdf].
Andrew Piper, Richard Jean So and David Bamman (2021), "Narrative Theory for Computational Narrative Understanding," EMNLP 2021 [pdf].
Jon Gillick, Joshua Yang, Carmine-Emanuele Cella and David Bamman (2021), "Drumroll Please: Modeling Multi-Scale Rhythmic Gestures with Flexible Grids," Transactions of the International Society for Music Information Retrieval (TISMIR) [pdf].
Jon Gillick, Wesley Deng, Kimiko Ryokai and David Bamman (2021), "Robust Laughter Detection in Noisy Environments," Interspeech [pdf].
Li Lucy and David Bamman (2021), "Gender and Representation Bias in GPT-3 Generated Stories," NAACL 2021 Workshop on Narrative Understanding [pdf].
Jon Gillick and David Bamman (2021), "What to Play and How to Play it: Guiding Generative Music Models with Multiple Demonstrations," New Interfaces for Musical Expression (NIME) [pdf].
Li Lucy and David Bamman (2021), "Characterizing English Variation across Social Media Communities with BERT," Transactions of the ACL [pdf].
Matthew Sims and David Bamman (2020), "Measuring Information Propagation in Literary Social Networks," EMNLP 2020 [pdf].
Matthew Jörke, Jon Gillick, Matthew Sims and David Bamman (2020), "Attending to Long-Distance Document Context for Sequence Labeling," Findings of EMNLP 2020 [pdf].
David Bamman (2020), "Born-Literary Natural Language Processing," Debates in Digital Humanities, [preprint].
David Bamman and Patrick J. Burns (2020), "Latin BERT: A Contextual Language Model for Classical Philology," [preprint].
David Bamman, Olivia Lewke and Anya Mansoor (2020), "An Annotated Dataset of Coreference in English Literature," LREC 2020 [pdf].
Matthew Sims, Jong Ho Park and David Bamman (2019), "Literary Event Detection," ACL 2019 [pdf].
Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck and David Bamman (2019), "Learning to Groove with Inverse Sequence Transformations," ICML 2019 [pdf].
Jon Gillick, Carmine-Emanuele Cella and David Bamman, "Estimating Unobserved Audio Features for Target-Based Orchestration," ISMIR 2019 [pdf].
Jon Gillick and David Bamman (2019), "Breaking Speech Recognizers to Imagine Lyrics," NeurIPS Workshop on Machine Learning for Creativity and Design [pdf].
David Bamman, Sejal Popat and Sheng Shen (2019), "An Annotated Dataset of Literary Entities," NAACL 2019 [pdf].
Jon Gillick and David Bamman (2018), "Please Clap: Modeling Applause in Campaign Speeches," NAACL 2018 [pdf].
Jon Gillick and David Bamman (2018), "Telling Stories with Soundtracks: An Empirical Analysis of Music in Film," NAACL 2018 Storytelling Workshop [pdf].
Ted Underwood, David Bamman, and Sabrina Lee (2018), "The Transformation of Gender in English-Language Fiction," Cultural Analytics [pdf].
Kimiko Ryokai, Elena Durán López, Noura Howell, Jon Gillick, and David Bamman (2018), "Capturing, Representing, and Interacting with Laughter," CHI 2018 [pdf].
Lara McConnaughey, Jennifer Dai and David Bamman (2017), "The Labeled Segmentation of Printed Books," EMNLP 2017 [pdf].
Yi Wu, David Bamman and Stuart Russell (2017), "Adversarial Training for Relation Extraction," EMNLP 2017 [pdf].
David Bamman, Michelle Carney, Jon Gillick, Cody Hennesy, and Vijitha Sridhar (2017), "Estimating the Date of First Publication in a Large-Scale Digital Library," JCDL 2017 [pdf]
David Bamman (2017), "Natural Language Processing for the Long Tail," Digital Humanities 2017 [pdf]
Smitha Milli and David Bamman (2016), "Beyond Canonical Texts: A Computational Analysis of Fanfiction," EMNLP 2016 [pdf]
David Bamman (2016), "Interpretability in Human-Centered Data Science," CSCW Workshop on Human-Centered Data Science [pdf]
David Bamman and Noah Smith, "Open Extraction of Fine-Grained Political Statements," EMNLP 2015. [pdf]
David Bamman and Noah Smith, "Contextualized Sarcasm Detection on Twitter," ICWSM 2015. [pdf] [bib]
David Bamman and Noah Smith, "Unsupervised Discovery of Biographical Structure in Text," Transactions of the ACL (October 2014). [pdf] [synopsis] [bib]
David Bamman, Jacob Eisenstein and Tyler Schnoebelen, "Gender Identity and Lexical Variation in Social Media," Journal of Sociolinguistics 18.2 (2014). [article] [preprint] [bib]
David Bamman, Ted Underwood and Noah Smith, "A Bayesian Mixed Effects Model of Literary Character," ACL 2014. [pdf] [synopsis] [bib]
David Bamman, Chris Dyer and Noah Smith, "Distributed Representations of Geographically Situated Language," ACL 2014. [pdf] [synopsis] [bib]
David Bamman, Brendan O'Connor and Noah Smith, "Learning Latent Personas of Film Characters," ACL 2013. [pdf] [data] [code] [bib]
David Bamman, Adam Anderson, and Noah Smith, "Inferring Social Rank in an Old Assyrian Trade Network," Digital Humanities (2013) [ArXiv]
Schneider, Nathan, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Jason Baldridge, Noah A. Smith, and Chris Dyer, "A Framework for (Under)specifying Dependency Syntax without Overloading Annotators," In Proceedings of the ACL Linguistic Annotation Workshop (LAW 2013), Sofia, Bulgaria, August 2013. [Extended version]
David Bamman, Brendan O'Connor and Noah A. Smith, "Censorship and Deletion Practices in Chinese Social Media," First Monday 17.3 (March 2012). [html] [bib]
- Press: [BBC] [New Scientist]
O'Connor, Brendan, David Bamman and Noah A. Smith, "Computational Text Analysis for Social Science: Model Assumptions and Complexity," NIPS Workshop on Computational Social Science and the Wisdom of Crowds (2011). [pdf] [bib]
David Bamman, and Gregory Crane, "Measuring Historical Word Sense Variation," in: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011). Runner up, Best Paper Award. [pdf] [bib]
David Bamman, and Gregory Crane, "The Ancient Greek and Latin Dependency Treebanks," in: Caroline Sporleder, Antal van den Bosch and Kalliopi Zervanou (eds.), Language Technology for Cultural Heritage (Springer, 2011). [pdf] [bib]
David Bamman, "Mapping the Demographics of American English with Twitter," Language Log, May 18, 2010. [html]
David Bamman, Alison Babeu, and Gregory Crane, "Transferring Structural Markup Across Translations Using Multilingual Alignment and Projection," in: Proceedings of the 10th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2010). Winner, Best Paper Award. [pdf] [bib]
David Bamman, Francesco Mambrini and Gregory Crane, "An Ownership Model of Annotation: The Ancient Greek Dependency Treebank," in: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) (Milan, Italy: 2009). [pdf] [bib]
David Bamman, and Gregory Crane, "Computational Linguistics and Classical Lexicography," Digital Humanities Quarterly 3.1 (2009). [html] [bib]
David Bamman, Marco Passarotti and Gregory Crane, "A Case Study in Treebank Collaboration and Comparison: Accusativus cum Infinitivo and Subordination in Latin," Prague Bulletin of Mathematical Linguistics 90 (2008). [pdf] [bib]
David Bamman and Gregory Crane, "The Logic and Discovery of Textual Allusion," in: Proceedings of the 2008 LREC Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). [pdf] [bib]
David Bamman and Gregory Crane, "Building a Dynamic Lexicon from a Digital Library," in: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008). [pdf] [bib]

Datasets

11K Latin Books. 11,261 OCR'd Latin texts from the Internet Archive (1.38B words), along with associated metadata detailing the dates of composition.
CMU Book Summary Dataset. 16,559 book plot summaries + metadata.
CMU Movie Summary Dataset. 42,306 movie plot summaries + metadata
Twitter14K Dataset. Aggregated word counts from 14,464 Twitter users (9.2M tweets)