Williams CSCI 375 | Natural Language Processing

Welcome!

Natural language processing (NLP) is a set of methods for making human language accessible to computers. NLP underlies many technologies we use on a daily basis including automatic machine translation, search engines, email spam detection, and automated personalized assistants. These methods draw from a combination of algorithms, linguistics and statistics. This course will provide a foundation in building NLP models to classify, generate, and learn from text data.

Logistics

Schedule

Note: This is a tentative schedule and subject to change.

For readings, we will primarily use Jurafsky & Martin, Speech and Language Processing, 3rd edition (J&M). This textbook edition is available online for free here. Other readings will be linked on this page or distributed via Piazza.

Week Dates Theme & Lecture Materials Readings Deliverables
0 Feb 3 Introduction

[slides]
1 Feb 6, 8, 10 Text Processing

[slides] [slides-solutions]
[command-line-demo]
[regex tokenization html ipynb]
HW0: Preliminaries
   Released: Mon, Feb 6
   Due: Thurs, Feb 9 at 9:59pm
2 Feb 13, 15

[Feb 17 - No class, Winter Carnival]
Language Modeling and Naive Bayes

[slides][slides-solutions]
[ngrams.pdf] [naive-bayes.pdf] [shakespeare ngrams html ipynb]
HW1: Spam via Regular Expressions
   Released: Mon, Feb 13
   Due: Thurs, Feb 16 at 9:59pm
3 Feb 20, 22, 24 Logistic Regression

[slides] [slides-solutions]
[logreg.pdf] [numpy html ipynb]
  • J&M Ch.5: Logistic Regression
HW2: Naive Bayes
   Released: Mon, Feb 20
   Due: Fri, Feb 24 at 3:59pm ET
4 Feb 27
Mar 1, 3
Vector Semantics

[slides] [slides-solutions]
[numpy vectorizing html ipynb]
[vector-semantics.pdf]
HW3: Logistic Regression
   Released: Mon, Feb 27
   Due: Fri, Mar 3 at 3:59pm ET
5 March 6, 8, 10 Deep Learning for NLP

[slides] [slides-solutions]
[computation graphs, solutions]
[backprop html ipynb]
[pytorch html ipynb]
[pretrained html ipynb]
  • J&M Ch.7: Neural Networks
HW4: Vector Semantics
   Released: Mon, Mar 6
   Due: Fri, Mar 10 at 3:59pm ET
6 Mar 13, 15

[Mar 17 - No class due to midterm]
Monday - Wrap-Up Deep Learing

Wednesday - Project Logistics & Midterm Review

Thursday March 16, Evening Midterm 6-9:30pm (materials thru HW4)

[Midterm Guide]
Project: Group preference form
   Due: Sunday (3/19) at 10pm ET
Mar 18 - April 2 Spring Break
7 Apr 3, 5, 7 Dialogue systems and Transformers

[slides]
  • J&M Ch. 10: Transformers and Pretrained Language Models
  • J&M Ch.24: Dialog Systems and Chatbots
HW5: Deep Learning for NLP
   Released: Mon, Apr 3
   Due: Fri, Apr 7 at 3:59pm ET

Project: Topic selection form
   Due: Sun, Apr 9 at 9:59pm ET

Project: Scheduling form
   Due: Sun, Apr 9 at 9:59pm ET
8 Apr 10-14 Lecture time: Flipped classroom, project work

[slides]
HW6: Chat Bots (Groups)
   Released: Weds, April 12
   Checkpoint (Parts 1&2): Fri, April 21 at 3:59pm ET
   Due: Fri, Apr 28 at 3:59pm ET (*no late submissions)

Project: Mini-lectures next week
9 Apr 17-21 Lecture time: Student Mini-Lectures

[schedule] [announcement slides]
Project: Annotated Bibliography
   Due: Sun Apr 23 at 9:59pm ET
10 Apr 24-28 No full-class lecture. Distributed project meetings with Katie

[schedule alternative view]
Project: Outline
   Due: Sun Apr 30 at 9:59pm ET
11 May 1-5 No full-class lecture. Distributed project meetings with Katie Project: Code for Code Review
   Due: Sun May 7 at 9:59pm ET
12 May 8-12 No full-class lecture. Distributed project meetings with Katie

Project Poster Session
Bronfman Auditorium
Thurs May 11, 6-8pm
Final Project Completed
Final report to Gradescope
Due: May 19, 4pm

Other NLP resources

These resources are highlighted to spark your curiosity about NLP topics not covered in class and possibly give you some ideas for your final project.