Info

This course introduces students to natural language processing and exposes them to the variety of methods available for reasoning about text in computational systems. NLP is deeply interdisciplinary, drawing on both linguistics and computer science, and helps drive much contemporary work in text analysis (as used in computational social science, the digital humanities, and computational journalism). We will focus on major algorithms used in NLP for various applications (part-of-speech tagging, parsing, coreference resolution, machine translation) and on the linguistic phenomena those algorithms attempt to model. Students will implement algorithms and create linguistically annotated data on which those algorithms depend.

Staff

David Bamman (office hours: Wednesday 10am-noon, 314 South Hall), dbamman@berkeley.edu

TAs (office hours Mondays 10:30-noon and Thursdays 2-3:30pm, 202 South Hall):
  • Jon Gillick (jongillick@berkeley.edu)
  • Katie Stasaski (katie_stasaski@berkeley.edu)
  • Alicia Tsai (aliciatsai@berkeley.edu)
  • Tobey Yang (minchu.yang@berkeley.edu)
  • Matthew Joerke (mjoerke@berkeley.edu)
  • Changran Hu (changran_hu@berkeley.edu)
  • Texts

    • [SLP3] Dan Jurafsky and James Martin, Speech and Language Processing (3nd ed. draft) [Available here]
    • [E] Jacob Eisenstein, Natural Language Processing (2018) [Available on bCourses]
    • [G] Yoav Goldberg, Neural Network Methods for Natural Language Processing (2017) [Available for free on campus/VPN here]
    • [PS] James Pustejovsky and Amber Stubbs, Natural Language Annotation for Machine Learning (2012) [Available for free on campus/VPN here]

    Syllabus

    (Subject to change.)

    Week Date Topic Readings Assignments
    11/21Introduction [slides] SLP2 ch 1  
    1/23Text classification 1 [slides] SLP3 ch 4
    21/28Text classification 2; logreg [slides] SLP3 ch 5 HW1 out (due 2/3)
    1/30Text classification 3; MLP and convolutional neural nets [slides]G 13  
    32/4Construction of truth; ethics [slides]PS ch. 6; Hovy and Spruit 2016 HW2 out (due 2/12)
    2/6Language modeling 1 [slides]SLP3 ch 3  
    42/11Language modeling 2; RNN [slides]SLP3 ch 7; G 14  
    2/13Vector semantics and static word embeddings [slides] SLP3 ch 6 HW3 out (due 2/24)
    52/18Contextual word embeddings (BERT, ELMo); attention and transformers [slides] Smith 2019, Devlin et al. 2019 Project proposal due (Info 259)
    2/20Sequence labeling problems: POS tagging; HMM SLP3 ch 8  
    62/25MEMM, CRF SLP3 ch 8 HW4 out (due 3/4)
    2/27Neural sequence labeling SLP3 ch 9; E 7  
    73/3Context-free syntax SLP3 ch 12  
    3/5Context-free parsing algorithmsSLP3 ch 13, 14  
    83/10Review
    3/12Midterm
    93/17Dependency parsing 1SLP3 ch 15  
    3/19Dependency parsing 2SLP3 ch 15 HW5 out (due 4/1)
    103/23Spring break
    3/26Spring break
    113/31Semantic role labeling SLP3 ch 20 Project midterm report due (Info 259)
    4/2Coreference resolution SLP3 ch 22 HW6 out (due 4/13)
    124/7Information extractionSLP3 ch 18  
    4/9Multimodal NLP (Jon Gillick)  
    134/14Wordnet, supersenses and WSDSLP3 ch 19 HW7 out (due 4/20)
    4/16Question answeringSLP3 ch 25  
    144/21Text generation (Katie Stasaski)SLP3 ch 26 HW8 out (due 4/29)
    4/23Machine translationE 18, G 17
    154/28Social NLP
    4/30Future and review
    5/5 Final project presentations (Info 259)
    5/11 Final project reports due (Info 259)
    5/15Final exam (7pm-10pm)

    Prerequisites

    • — Algorithms: Computer Science 61B
    • — Probability/Statistics: Computer Science 70, Math 55, Statistics 134, Statistics 140 or Data 100
    • — Strong programming skills

    Grading

    Info 159

    50% 8 homeworks
    10% Weekly quizzes
    20% Midterm exam
    20% Final exam

    Info 259

    40% 8 homeworks
    10% Weekly quizzes
    20% Midterm exam
    30% Project:
          5% Proposal/literature review
          5% Midterm report
          15% Final report
          5% Presentation

    All lectures will be recorded and made available through bCourses; attendance at lectures is not required (but it is recommended). Weekly quizzes will test your knowledge of that week's lectures and readings, so be sure to watch the lecture to stay on track.

    Project (Info 259)

    Info 259 will be capped by a semester-long project (involving one to three students), involving natural language processing -- either focusing on core NLP methods or using NLP in support of an empirical research question. For examples of the former, see papers published at ACL, NAACL and EMNLP; for examples of the latter, see workshops for NLP and Computational Social Science, Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Natural Language Processing Techniques for Educational Applications, Noisy User-Generated Text, and many more.

    The project will be comprised of four components:

    • — Project proposal and literature review. Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages; 5 sources)
    • — Midterm report. By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (4 pages; 10 sources)
    • — Final report. The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards for conference publication—including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (6 pages, not including references)
    • — Presentation. At the end of the semester, teams will present their work in a poster session.
    All reports should use the ACL 2019 style files for either LaTeX or Microsoft Word.

    Policies

    Academic Integrity

    All students will follow the UC Berkeley code of conduct. You may discuss homeworks at a high level with your classmates (if you do, include their names on the submission), but each homework deliverable must be completed independently -- all writing and code must be your own. All quizzes must be completed on your own. If you mention the work of others, you must be clear in citing the appropriate source (For additional information on plagiarism, see here.) This holds for source code as well: if you use others' code (e.g., from StackOverflow), you must cite its source. All homeworks and project deliverables are due at the time and date of the deadline.

    Students with Disabilities

    Our goal is to make class a learning environment accessible to all students. If you need disability-related accommodations and have a Letter of Accommodation from the DSP, have emergency medical information you wish to share with me, or need special arrangements in case the building must be evacuated, please inform me immediately. I'm happy to discuss privately after class or at my office.

    Late assignments

    Student have will have a total of two late days to use when turning in homework assignments and quizzes (not project deliverables for Info 259); each late day extends the deadline by 24 hours. Each homework and quiz will be due at 11:59pm, and will have a 2-hour grace period for any last-minute submission issues. Late days and incompletes will be assessed immediately following the grace period (at 2:00am sharp). The grace period applies to late days as well (if a homework is due at 11:59pm 1/21, and you use a late day to extend it to 11:59pm 1/22, a submission will not be accepted after 2:00am 1/23).

    Curving

    Grades for this course will be curved according to the EECS grading guidelines for undergraduate courses. This class is an upper-division course.

    Exams

    This course has a midterm exam scheduled for 3/12 (in-class) and a final exam scheduled for 5/15 (7pm-10pm). We will not be offering alternative exam dates, so if you anticipate a conflict, you should not register for this course.