Info

This course introduces students to natural language processing and exposes them to the variety of methods available for reasoning about text in computational systems. NLP is deeply interdisciplinary, drawing on both linguistics and computer science, and helps drive much contemporary work in text analysis (as used in computational social science, the digital humanities, and computational journalism). We will focus on major algorithms used in NLP for various applications (part-of-speech tagging, parsing, coreference resolution, machine translation) and on the linguistic phenomena those algorithms attempt to model. Students will implement algorithms and create linguistically annotated data on which those algorithms depend.

Texts

  • [SLP3] Dan Jurafsky and James Martin, Speech and Language Processing (3nd ed. draft) [Available here]
  • [SLP2] Dan Jurafsky and James Martin, Speech and Language Processing (2nd ed., 2009)
  • [G] Yoav Goldberg, Neural Network Methods for Natural Language Processing (2017) [Available for free on campus/VPN here]
  • [PS] James Pustejovsky and Amber Stubbs, Natural Language Annotation for Machine Learning (2012) [Available for free on campus/VPN here]

Syllabus

(Subject to change.)

Week Date Topic Readings
18/24IntroductionSLP2 ch 1
28/29Text classification 1SLP3 ch 6
8/31Text classification 2; logreg and NNSLP3 ch 7; G 4
39/5Text classification 3; CNNG 13
9/7Construction of truth; ethicsPS ch. 6; Hovy and Spruit 2016
49/12Language modeling 1SLP3 ch 4
9/14Language modeling 2; RNNG 14
59/19Vector semantics and word embeddingsSLP3 ch 15, 16
9/21Sequence labeling problems: POS tagging, NERSLP3 ch 10
69/26HMM, MEMM; ViterbiSLP3 ch 9
9/28CRF, LSTMWallach 2004
710/3Features and hypothesis testingBerg-Kirkpatrick et al. 2012; Søgaard et al. 2014
10/5Dependency syntaxSLP3 ch 14
810/10Dependency parsing algorithmsSLP3 ch 14
10/12Review
910/17Midterm
10/19Context-free syntaxSLP3 ch 12
1010/24Context-free parsing algorithmsSLP2 ch 14
10/26Compositional semantics [Jacob Andreas]SLP2 ch 18
1110/31Semantic parsing
11/2Semantic role labelingSLP3 ch 22
1211/7Wordnet, supersenses and WSDSLP3 ch 17
11/9Discourse and pragmaticsSLP2 ch 21
1311/14Coreference resolutionSLP2 ch 21
11/16Social NLPPick one: Voigt et al. 2017; Cheng et al. 2017; Underwood 2016
1411/21Machine translation; seq2seq [John DeNero]SLP2 ch 25; G 17
11/23No class (Thanksgiving)
1511/28Conversational agentsSLP3 ch 29
11/30Future and review

Grading

Info 159

50% Homeworks (8) and short in-class quizzes (8)
20% Midterm exam
30% Final exam

Info 259

50% Homeworks (8) and short in-class quizzes (8)
20% Midterm exam
30% Project:
      5% Proposal/literature review
      5% Midterm report
      15% Final report
      5% Presentation

The homework/quiz grade will be calculated as the average of the top 13 homeworks and quizzes (i.e., the lowest 3 scores out of the 16 total will be dropped).

Project

Info 259 will be capped by a semester-long project (involving 1 or 2 students), involving natural language processing -- either focusing on core NLP methods or using NLP in support of an empirical research question. The project will be comprised of four components:

  • — Project proposal and literature review. Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages; 5 sources)
  • — Midterm report. By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (4 pages; 10 sources)
  • — Final report. The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards for conference publication—including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (8 pages)
  • — Presentation. At the end of the semester, teams will present their work in a poster session.
All reports should use the ACL 2017 style files for either LaTeX or Microsoft Word.

Policies

Academic Integrity

All students will follow the UC Berkeley code of conduct. While the group project is a collaborative effort, all homeworks must be completed independently. All writing must be your own; if you mention the work of others, you must be clear in citing the appropriate source (For additional information on plagiarism, see here.) This holds for source code as well: if you use others' code (e.g., from StackOverflow), you must cite its source. Late homeworks will not be accepted.

Students with Disabilities

Our goal is to make class a learning environment accessible to all students. If you need disability-related accommodations and have a Letter of Accommodation from the DSP, have emergency medical information you wish to share with me, or need special arrangements in case the building must be evacuated, please inform me immediately. I'm happy to discuss privately after class or at my office.

Late assignments

No late assignments will be accepted. (Though note the grading calculation allows for the lowest 3 homework and quiz scores to not count.)