Welcome!

Natural language processing (NLP) is a set of methods for making human language accessible to computers. NLP underlies many technologies we use on a daily basis including automatic machine translation, search engines, email spam detection, and automated personalized assistants. These methods draw from a combination of algorithms, linguistics and statistics. This course will provide a foundation in building NLP models to classify, generate, and learn from text data.

Logistics

Schedule

Note: This is a tentative schedule and subject to change.

For textbook readings, we will primarily use Jurafsky & Martin, Speech and Language Processing, 3rd edition (J&M). This textbook edition is available online for free here. Other readings will be linked on this page or distributed via a course packet on Piazza.

Week Date Lecture Textbook Papers Deliverables
0 Th Sep 5 Introduction
   [handout0]
J&M Ch.1: Introduction HW0: Preliminaries
   Released: Th Sep 5
   Due: Th Sep 12, 10pm
1 Tu Sept 10 Regular Expressions & Evaluation Metrics    [code] J&M Ch.2: Regular Expressions, Text Normalization, Edit Distance HW1: Spam via Regular Expressions
   Released: Th Sep 12
   Due: Th Sep 19, 10pm
Th Sept 12 Tokenization    [code]
2 Tu Sept 17 Language Modeling    [code]    [board-work] J&M Ch.3: Language Modeling with N-Grams HW2: Naive Bayes
   Released: Th Sep 19
   Due: Th Sep 26, 10pm
Th Sept 19 Naive Bayes    [code]    [board-work] J&M Ch.4: Naive Bayes and Sentiment Classification
3 Tu Sept 24 Logistic Regression, Part 1    [code] J&M Ch.5: Logistic Regression Pang et al. "Thumbs up? Sentiment Classification using Machine Learning Techniques." EMNLP, 2002. HW3: Logistic Regression
   Released: Th Sep 26
   Due: Th Oct 3, 10pm
   Due: Sat Oct 5, 10pm
Th Sept 26 Logistic Regression, Part 2    [code]    [board-work]
4 Tu Oct 1 Vector Semantics, Part 1 J&M Ch.6: Vector Semantics HW4: Vector Semantics
   Released: Th Oct 3
   Due: Th Oct 10, 10pm
Th Oct 3 Vector Semantics, Part 2
5 Tu Oct 8 Case Studies: Ethics & Bias in NLP Blodgett et al. "Language (Technology) is Power: A Critical Survey of 'Bias' in NLP." ACL, 2020. Midterm Project: Movie Chatbots
   Released: Th Oct 10
   Checkpoint: Th Oct 17, 10pm
   Due: Th Oct 24, 10pm
Th Oct 10 Dr. Su Lin Blodgett Guest Lecture [slides]
6 Tu Oct 15 No classes (reading period)
Th Oct 17 Feedforward Networks    [code] J&M Ch.7: Neural Networks
7 Tu Oct 22 Training & Backprop Iyyer et al. "Deep Unordered Composition Rivals Syntactic Methods for Text Classification." ACL, 2015. HW 5: Deep Learning for NLP
   Released: Th Oct 24
   Due: Th Oct 31
Th Oct 24 Deep Training, Part 2    [code]
8 Tu Oct 29 Self-Attention & Transformers J&M Ch. 9: Transformers HW 6: Transformers
   Released: Th Oct 31
   Due: Th Nov 7
   Released: Tu Nov 5
   Due: Wed Nov 13
Th Oct 31 LLMs    [code] J&M Ch. 10: Large Language Models
9 Tu Nov 5 Fine-tuning    [code] J&M Ch. 12: Model Alignment, Prompting, and In-Context Learning Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." NAACL, 2019. Final Project Proposal
   Due: Th Nov 14
   Due: Sun Nov 17
Th Nov 7 Instruction-Tuning & RLHF
10 Tu Nov 12 Evaluation & Annotation    [code] Ribeiro et al. "Beyond Accuracy: Behavioral Testing of NLP Models with CheckList." ACL, 2020.
Th Nov 14 GPUs & Parameter-efficient methods
11 Th Nov 19 Flipped Classroom: Final Project Work Final Project
   Due: Th Dec 12, 10pm
Th Nov 21
12 Tu Nov 26 NLP + Computational Social Science [andy-slides] Voigt et al. "Language from police body camera footage shows racial disparities in officer respect." PNAS, 2017.
Th Nov 28 Thanksgiving break, no classes
13 Tu Dec 3 Wrap-up and Final Project Presentations
Th Dec 5