Staff
Texts
- [SLP3] Dan Jurafsky and James Martin, Speech and Language Processing (3nd ed. draft) [Available here]
- [E] Jacob Eisenstein, Natural Language Processing (2018) [Available on bCourses]
- [G] Yoav Goldberg, Neural Network Methods for Natural Language Processing (2017) [Available for free on campus/VPN here]
- [PS] James Pustejovsky and Amber Stubbs, Natural Language Annotation for Machine Learning (2012) [Online access available for free through the UC library here]
Syllabus
(Subject to change.)
Week | Date | Topic | Readings | Assignments |
1 | 1/19 | Introduction [slides] | SLP2 ch 1 | |
1/21 | Construction of truth; ethics [slides] | PS ch. 6; Hovy and Spruit 2016 | HW1 out | |
2 | 1/26 | Text classification 1 [slides] | SLP3 ch 4 | |
1/28 | Text classification 2; logreg [slides] | SLP3 ch 5 | HW2 out | |
3 | 2/2 | Text classification 3; MLP and convolutional neural nets [slides] | G 13 | |
2/4 | Language modeling 1 [slides] | SLP3 ch 3 | HW3 out | |
4 | 2/9 | Language modeling 2; RNN [slides] | SLP3 ch 7; G 14 | |
2/11 | Vector semantics and static word embeddings [slides] | SLP3 ch 6 | ||
5 | 2/16 | Contextual word embeddings (BERT, ELMo); attention and transformers [slides] | Smith 2019, Devlin et al. 2019 | HW4 out |
2/18 | Sequence labeling problems: POS tagging; HMM [slides] | SLP3 ch 8 | Project proposal due (Info 259) | |
6 | 2/23 | MEMM, CRF [slides] | SLP3 ch 8 | |
2/25 | Neural sequence labeling [slides] | SLP3 ch 9; E 7 | HW5 out | |
7 | 3/2 | Context-free syntax [slides] | SLP3 ch 12 | |
3/4 | Context-free parsing algorithms [slides] | SLP3 ch 13 | ||
8 | 3/9 | Review [slides] | ||
3/11 | Midterm | |||
9 | 3/16 | Dependency parsing 1 [slides] | SLP3 ch 14 | |
3/18 | Dependency parsing 2 [slides] | SLP3 ch 14 | ||
10 | 3/22 | Spring break | ||
3/25 | Spring break | |||
11 | 3/30 | Semantic role labeling [slides] | SLP3 ch 19 | Project midterm report due (Info 259) |
4/1 | Wordnet, supersenses and WSD [slides] | SLP3 ch 18 | HW6 out | |
12 | 4/6 | Coreference resolution [slides] | SLP3 ch 21 | |
4/8 | Information extraction [slides] | SLP3 ch 17 | HW7 out | |
13 | 4/13 | Question answering [slides] | SLP3 ch 23 | |
4/15 | Text generation (Katie) [slides] | SLP3 ch 24 | HW8 out | |
14 | 4/20 | Multimodal NLP (Jon) [slides] | ||
4/22 | Machine translation [slides] | SLP3 ch 11 | ||
15 | 4/27 | Social NLP [slides] | Pick one: Voigt et al. 2017; Underwood et al. 2018; Antoniak et al. 2019 | |
4/29 | Final project presentations | |||
15 | 5/10 | NLP subfield survey due (Info 159); Final project report due (Info 259) |
Prerequisites
- — Algorithms: Computer Science 61B
- — Probability/Statistics: Computer Science 70, Math 55, Statistics 134, Statistics 140 or Data 100
- — Strong programming skills
Grading
Info 159
50% | Homeworks |
10% | Weekly quizzes |
20% | Midterm exam |
20% | NLP subfield survey |
Info 259
40% | Homeworks | 10% | Weekly quizzes |
20% | Midterm exam |
30% | Project: |
5% Proposal/literature review | |
5% Midterm report | |
15% Final report | |
5% Presentation |
All lectures will be recorded and made available through bCourses; attendance at lectures is not required (but it is recommended). Weekly quizzes will test your knowledge of that week's lectures and readings, so be sure to watch the lecture to stay on track.
NLP subfield survey (Info 159)
Project (Info 259)
Info 259 will be capped by a semester-long project (involving one to three students), involving natural language processing -- either focusing on core NLP methods or using NLP in support of an empirical research question. For examples of the former, see papers published at ACL, NAACL and EMNLP; for examples of the latter, see workshops for NLP and Computational Social Science, Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Natural Language Processing Techniques for Educational Applications, Noisy User-Generated Text, and many more.
The project will be comprised of four components:
- — Project proposal and literature review. Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages; 5 sources)
- — Midterm report. By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (4 pages; 10 sources)
- — Final report. The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards for conference publication—including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (6 pages, not including references)
- — Presentation. At the end of the semester, teams will present their work in a poster session.
Policies
Academic Integrity
All students will follow the UC Berkeley code of conduct. You may discuss homeworks at a high level with your classmates (if you do, include their names on the submission), but each homework deliverable must be completed independently -- all writing and code must be your own. All quizzes and exams must be completed on your own. If you mention the work of others, you must be clear in citing the appropriate source (For additional information on plagiarism, see here.) This holds for source code as well: if you use others' code (e.g., from StackOverflow), you must cite its source. All homeworks and project deliverables are due at the time and date of the deadline. We have zero tolerance policy for cheating and plagiarism; violations will be referred to the Center for Student Conduct and will likely result in failing the class.
Piazza
We'll use Piazza as a platform for asking and answering questions about the course material, including homeworks. Students are encouraged to actively participate on this forum and help others by answering questions that arise (helpful students can see a grade bump across a threshold (e.g., B+ to A-) for this participation. When helping with homework questions, keep the discussion to the high-level concepts; don't post answers to homeworks or quiz/exam questions.
TA office hours
TA office hours will be held over Discord, which enables students to self-organize into channels to discuss any questions they have in common among themselves, with the TA moving between channels to answer questions for everyone there. While in the channel, keep academic integrity in mind: you may discuss homework questions at a high level with others present, but don't discuss specific answers or share screens with code solutions. Neither the TA office hours or Piazza should be used for pre-grading (asking if a specific answer to a homework or quiz question is correct before the assignment is due).
Students with Disabilities
Our goal is to make class a learning environment accessible to all students. If you need disability-related accommodations and have a Letter of Accommodation from the DSP, have emergency medical information you wish to share with me, or need special arrangements in case the building must be evacuated, please inform me immediately. I'm happy to discuss privately after class or at my office.
Late assignments
Student have will have a total of three late days to use when turning in homework assignments and quizzes (not project deliverables for Info 259); each late day extends the deadline by 24 hours. If all late days have been used up, homeworks/quizzes can be turned in up to 48 hours late for 50% credit; anything submitted after 48 hours late = 0 credit. Each homework and quiz will be due at 11:59pm, and will have a 2-hour grace period for any last-minute submission issues. Late days and incompletes will be assessed immediately following the grace period (at 2:00am sharp). The grace period applies to late days as well (if a homework is due at 11:59pm 1/21, and you use a late day to extend it to 11:59pm 1/22, you may turn it in up to 2:00am 1/23 and still be assessed 1 late day.) Late days are assessed immediately once homeworks or quizzes are submitted late and can't be retroactively changed (if you submit 2 homeworks and 2 quizzes late, for example, you can't decide after the fact which ones to apply your 3 slip days to -- they apply to whichever homeworks or quizzes use them up first).
Curving
Grades for this course will not be curved. Minimum thresholds for letter grades are the following: 93 A, 90 A-, 87 B+, 83 B, 80 B-, 77 C+ 73 C, 70 C-, 67 D+, 63 D, 60 D-, 0 F. Students taking the course P/NP must complete all deliverables and will receive a P if their grade is greater or equal to 70 (C-); Students taking S/U must complete all deliverables and will receive an S if their grade is greater or equal to 80 (B-).
Exams
This course has a midterm exam scheduled for 3/11 (in-class) and no final exam. We will not be offering alternative midterm exam dates, so if you anticipate a conflict, you should not register for this course.