Texts
- [SLP3] Dan Jurafsky and James Martin, Speech and Language Processing (3nd ed. draft) [Available here]
- [SLP2] Dan Jurafsky and James Martin, Speech and Language Processing (2nd ed., 2009)
- [G] Yoav Goldberg, Neural Network Methods for Natural Language Processing (2017) [Available for free on campus/VPN here]
- [PS] James Pustejovsky and Amber Stubbs, Natural Language Annotation for Machine Learning (2012) [Available for free on campus/VPN here]
Syllabus
(Subject to change.)
Week | Date | Topic | Readings |
1 | 8/24 | Introduction [slides] | SLP2 ch 1 |
2 | 8/29 | Text classification 1 [slides] | SLP3 ch 6 |
8/31 | Text classification 2; logreg [slides] | SLP3 ch 7; G 4 | |
3 | 9/5 | Text classification 3; neural nets [slides] | G 13 |
9/7 | Construction of truth; ethics [slides] | PS ch. 6; Hovy and Spruit 2016 | |
4 | 9/12 | Language modeling 1 [slides] | SLP3 ch 4 |
9/14 | Language modeling 2; RNN [slides] | G 14 | |
5 | 9/19 | Vector semantics and word embeddings [slides] | SLP3 ch 15, 16 |
9/21 | Sequence labeling problems: POS tagging [slides] | SLP3 ch 10 | |
6 | 9/26 | HMM, MEMM; Viterbi [slides] | SLP3 ch 9 |
9/28 | CRF, LSTM [slides] | Wallach 2004 | |
7 | 10/3 | Features and hypothesis testing [slides] | Berg-Kirkpatrick et al. 2012; Søgaard et al. 2014 |
10/5 | Context-free syntax [slides] | SLP3 ch 11 | |
8 | 10/10 | Context-free parsing algorithms [slides] | SLP3 ch 12, 13 |
10/12 | Review [slides] | ||
9 | 10/17 | Midterm | |
10/19 | Dependency syntax [slides] | SLP3 ch 14 | |
10 | 10/24 | Dependency parsing algorithms [slides] | SLP3 ch 14 |
10/26 | Compositional semantics [Jacob Andreas] [slides] | SLP2 chs 17, 18 | |
11 | 10/31 | Semantic parsing [slides] | SLP3 11.6 (CCG), Zettlemoyer and Collins 2005 |
11/2 | Semantic role labeling [slides] | SLP3 ch 22 | |
12 | 11/7 | Wordnet, supersenses and WSD [slides] | SLP3 ch 17 |
11/9 | Discourse and pragmatics [slides] | SLP2 ch 21 | |
13 | 11/14 | Coreference resolution [slides] | SLP2 ch 21 |
11/16 | Social NLP [slides] | Pick one: Voigt et al. 2017; Cheng et al. 2017; Underwood 2016 | |
14 | 11/21 | Machine translation; seq2seq [John DeNero] [slides] | SLP2 ch 25; G 17 |
11/23 | No class (Thanksgiving) | ||
15 | 11/28 | Conversational agents [slides] | SLP3 ch 29 |
11/30 | Future and review [slides] | ||
RRR | 12/5 | Final project presentations (202 South Hall) |
Grading
Info 159
50% | Homeworks and short in-class quizzes (approximately 8 of each) |
20% | Midterm exam |
30% | Final exam |
Info 259
50% | Homeworks and short in-class quizzes (approximately 8 of each) |
20% | Midterm exam |
30% | Project: |
5% Proposal/literature review | |
5% Midterm report | |
15% Final report | |
5% Presentation |
For H homeworks and Q quizzes, the homework/quiz grade will be calculated as the highest (H+Q)-3 scores (i.e., the lowest 3 scores out of the H+Q total number will be dropped).
Project
Info 259 will be capped by a semester-long project (involving 1 or 2 students), involving natural language processing -- either focusing on core NLP methods or using NLP in support of an empirical research question. The project will be comprised of four components:
- — Project proposal and literature review. Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages; 5 sources)
- — Midterm report. By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (4 pages; 10 sources)
- — Final report. The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards for conference publication—including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (8 pages)
- — Presentation. At the end of the semester, teams will present their work in a poster session.
Policies
Academic Integrity
All students will follow the UC Berkeley code of conduct. While the group project is a collaborative effort, all homeworks must be completed independently. All writing must be your own; if you mention the work of others, you must be clear in citing the appropriate source (For additional information on plagiarism, see here.) This holds for source code as well: if you use others' code (e.g., from StackOverflow), you must cite its source. Late homeworks will not be accepted.
Students with Disabilities
Our goal is to make class a learning environment accessible to all students. If you need disability-related accommodations and have a Letter of Accommodation from the DSP, have emergency medical information you wish to share with me, or need special arrangements in case the building must be evacuated, please inform me immediately. I'm happy to discuss privately after class or at my office.