David Bamman

CAREER: Using Fiction to Improve Real-World Information Systems

Abstract

At a high level, this project aims to design computational methods to reason about the world of fiction, and, in turn, learn from fiction to inform the design of systems in the real world. While much work in artificial intelligence learns about the world from relatively short factual sources like news and Wikipedia, fiction offers a range of affordances for improving existing information systems and innovating new applications altogether. Unlike factual sources like news, fiction captures emotion, everyday action and commonsense, offering a vast source of information to bootstrap knowledge bases that can power question answering systems, conversational agents, and the next generation of artificial intelligence. This project will improve the performance of natural language understanding on fiction as a domain, and use it to explore two case studies: inferring the structure of everyday events in people's lives, including the relation between macro-level events (such as eating breakfast) and low-level micro-events (sitting down at the table, pouring another cup of coffee, putting the dishes in the sink); and learning the relationship between observed actions depicted in text and the broad-coverage mental attitude (such as joy, sadness, and surprise) of their agents. This project aims to draw in students and researchers in the social sciences and humanities, who have historically been underrepresented in computing. While the technical research carried out under this project directly speaks to how expertise in the social sciences and humanities can inform the computational design of information systems, the primary educational plan under this award will investigate one fundamental question: how to enable students outside STEM fields to learn and improve their skills in natural language processing, machine learning and data science. This work will engage researchers in the humanities and social sciences in technical research, teaching skills to students without technical backgrounds, and translating advances in computational methodology to advances in domain knowledge.

The fundamental work in this project aims to bridge the gap between computation and the humanities and social sciences by providing two case studies of how learning from a depicted world in fiction can improve systems that reason about the real world. This is a new frontier that can not only teach us about the limitations of current systems for textual entailment and sentiment analysis, but can also open up new areas of research at this intersection. This work will make progress on two tasks enabled by fiction: inferring the sequential and hierarchical order of commonplace actions, in which a single macro-event is comprised of several micro-events, and inferring the latent attitudes of people mentioned in text given observations of their actions. Both case studies draw on fiction as a source of knowledge, and require the development of computational models optimized to bridge the gap between fiction and reality. Concretely, this work will result in the publication of a new dataset of contemporary fiction, labeled for entities and coreference between them (which has the potential to yield a new state of the art for nested entity recognition and coreference resolution for this domain), a knowledge base of everyday actions extracted from fiction, open-source software for modeling hierarchical events and learning mental attitudes from observed actions, and publications at academic venues detailing the methodologies created under the scope of this project.

Grant information

This material is based upon work supported by the National Science Foundation for the project "CAREER: Using Fiction to Improve Real-World Information Systems" (IIS-1942591). Expected duration: 2020-2025.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Personnel

Principal investigator:

David Bamman, Associate Professor, UC Berkeley

Postdocs/students:

Matt Sims, postdoc, UC Berkeley
Lucy Li, PhD student, UC Berkeley

Materials

Li Lucy and David Bamman (2021), "Gender and Representation Bias in GPT-3 Generated Stories," NAACL 2021 Workshop on Narrative Understanding [pdf].
Matthew Sims and David Bamman (2020), "Measuring Information Propagation in Literary Social Networks," EMNLP 2020 [pdf].
Github repo for Computational Humanities course.