NLP final project objectives
The final project for NLP has three primary objectives:
- Objective 1: You will create an NLP project (code + data + report) that can become part of your professional portfolio.
- Objective 2: You will become the “class expert” on a particular NLP topic.
- Objective 3: You will learn about the research process by producing the equivalent of a “workshop paper” in NLP. Note: This is smaller in scope than a full-fledged research paper. It’s ok if you have “negative” final results.
Grading
The final project is 25% of your overall grade. This grade will have the following break-down
Deliverable |
Points |
Due Date |
Topic selection form |
5 |
Sun, Apr 9 at 9:59pm ET |
Mini-lecture |
20 |
Apr 17-21, 9:59pm ET the night before you lecture |
Annotated bibliography |
20 |
Sun Apr 23 at 9:59pm ET |
Project outline |
20 |
Sun Apr 30 at 9:59pm ET |
Code for code review |
20 |
Sun May 7 at 9:59pm ET |
Poster presentation |
20 |
Thurs May 11, 6pm ET |
Final report |
100 |
May 19, 3:59pm ET |
All deliverables except for the final report will be formative assessments. This means you will be provided qualitative feedback in an attempt to help you learn and grow but points will be “all or nothing” based on completion of the requirements.
The final report will be graded as a summative assessment based on the rubric provided below. This final report is the chance to incorporate all the qualitative feedback and suggestions you will receive from Katie and peers throughout the final weeks in the class.
Deliverables descriptions
Topic selection
- The scope of a topic should be equivalent to a textbook chapter or a published review/survey paper.
- In the topic selection form, you will answer these four questions:
- What is this topic?
- What source material are you drawing from as a starting place (i.e. which book chapter or review/survey paper)?
- What makes you excited about this topic?
- Why do other people (besides you) care about this topic?
- Submission type: Google form
Mini-lecture
- You will present your topic in the form of a “mini-lecture” in-person to the entire class during the week of April 17-21. This is your opportunity to learn about the topic in more depth yourself and provide your classmates with an overview of many topics in NLP.
- Scope: (4 minutes) x (number of people in the group)
- The time limit will be strictly enforced so practice your presentation many times to be both precise and concise.
- Every group will have a few minutes for questions at the end of their presentation.
- Your mini-lecture must address the following three questions:
- What is the definition of your topic?
- Why is your topic important to study?
- What are the 1-3 key computational approach(es) for your topic? If you are in a group of one, present only one computational approach. Groups of two should present two computational approaches and groups of three should present three approaches.
- Submission type: You must use Google slides for your presentation. Each group will submit a single Google slides deck to this shared Google drive folder. Please upload your slide deck the night before your presentation in class.
Annotated bibliography
- An annotated bibliography will help you identify what has previously been done in the research community on your topic, help you scaffold your reading of dense techinical papers, and (hopefully) help you generate ideas for your project.
- Scope: (2 papers) x (number of people in the group)
- Here are some heuristics to choose “good” papers:
- Check that it was published in “top” NLP or machine learning (ML) venue. “Top” NLP venues: TAACL, ACL, NAACL, EACL, or EMNLP. “Top” ML venues: NeurIPS, ICML, ICLR, JMLR, or TMLR.
- Look at the institution of the authors.
- Look at the number of citations on Google Scholar.
- Your write-up must answer the following 8 questions presented in this document.
- Submission type: Each group will submit a single .pdf document to Gradescope.
Project Outline
- The project outline will help you brainstorm and scaffold your technical contributions.
- The project outline is a written document that answers the following three questions:
- What is your proposed research contribution? This should be one sentence (analogous to a “thesis statement” when you’re writing an English paper).
- Here’s an example from Katie’s most recent paper, “We propose a new measure of scholarly jargon by using information-theoretic metrics and BERT-based word sense induction to identify discipline-specific word types and senses.” Your proposed contribution for a class final project can be much smaller in scope.
- How will you evaluate your approach? What data will you evaluate your method on? Provide a link to where the data can be downloaded.
- How do your initial ideas build from or differ from the papers you read for your annotated bibliography?
- Submission type: Each group will submit a single .pdf document to Gradescope.
Code for code review
- Every final project must include code. The code you submit for code review can be a work-in-progress, but try to put forward your best work as we will go over it during group meetings with Katie.
- You must host your code on Github in a private repository.
- Submission type: Add Katie as a collaborator to your Github repository (Github username: kakeith).
Poster
- The poster session is your opportunity to present your research goals and preliminary results. Use this as an opportunity to receive feedback from your peers and other faculty members (Rohit, Mark, and Steve have agreed to attend in addition to Katie).
- Every group will use the betterposter landscape template. Watch the betterposter video for an introduction on this style of scientific posters.
- Instructions on printing posters
- Read Williams' poster printing guidelines for students.
- After you have modified the betterposter landscape template with your content, you will print it as a landscape 48" (width) x 36" (height) poster.
- You must submit your posters to OIT for printing by May 8 at 11:59pm ET. A failure to meet this deadline may result in a failure for OIT to print your poster in time and will result in a 0 on this part of the project.
- Submission type: Print your posters and present them during the poster session on May 11 from 6-8pm.
Final report
- The final report will summarize all your contributions for the project and give you practice writing a techincal research paper.
- In NLP (and computer science in general) conciseness in writing is preferred to verbosity.
- Rubric (100 points):
- The report must be submitted as a pdf to Gradescope. You must write your final paper in Latex. You must use the ACL 2023 latex or Overleaf (shared Latex editor) template (5% adherence to style requirements).
- The final report must be a minimum of 5 pages and maximum 8 pages (If the report is not in this page range the project grade will be reduced by 10%). References do not count towards this page limit.
- Projects must adequately cite at least (4 x number of people in group) research articles. Note, to meet this requirement you will have to include a few more papers that you did not include in your annotated bibliography. (5%).
- The report must consist of the following sections (5% adherence to section requirements):
- Abstract.
At a high level, summarize what your problem is, what methods you used, and your results. An abstract that is shorter and more concise is better.
- Introuction.
What is your problem? Why is it important?
- Related work.
What have people previously done in regards to your problem? What work is related? (This is a great place to summarize some of the papers you wrote about in-depth for the annotated bibliography assignment).
- Dataset.
Describe what dataset(s) you are using, where these came from, and some basic properties of the dataset.
- Methods.
What methods are you using? What NLP models are you using and/or modifying and why?
- Results and evaluation.
What are your results? How did you evaluate these results?
- Conclusion.
What can you conclude from your project? What did you learn? What are future directions? Are there any real-world implications from your work?
- References (bibliography).
List citations here. Use the ACL style file for examples of how to cite certain works.
- Additionally, the report must also contain the following:
- At least two figures. These figures could show the results, interesting analyses, exploration of the features, an overview of the data and modeling pipeline, etc. (20%)
- At least one table. The table could consist of data statistics, results, etc. (10%)
- Writing mechanics: grammar and typos (10%).
- Writing clarity: high-level writing style and arguments conveyed effectively (10%).
- Mastery of NLP concepts (30%). Evaluation on this category could include (but is not limited to):
- Proper train-test split (or train-dev-test split or cross-validation)
- Proper data-driven selection of hyperparameters
- Comparing models against a baseline (e.g. predicting the majority class)
- Comparing more than one NLP model
- Proper use of a machine learning / NLP package (e.g. sklearn or Pytorch) or development of a new NLP model.
- Novelty of ideas (5%). A truly masterful project will include novel ideas that have not been tried before by others.
- Submission type:Each group will submit a single .pdf to Gradescope by May 19 at 4pm ET. No late submissions will be accepted.