Objectives
The final project for NLP has three primary objectives:
- Objective 1: You will become the “class expert” on a particular NLP topic.
- Objective 2: You will learn about the research process by producing the equivalent of a “workshop paper” in NLP. Note: This is smaller in scope than a full-fledged research paper. It’s ok if you have “negative” final results.
- Objective 3: You will create an NLP project (code + data + report) that can become part of your professional portfolio.
Grading
The final project is 20% of your overall grade. The project will be graded holistically via an additive scoring method inspired by gymnastics:
- Difficulty: This component is based on the difficulty of your project. There is no maximum score, but its reasonable to aim for a difficulty score around 10.
- Execution: This component is based on the execution of your project out of 10.
- Like gymnastics routines, projects with higher difficulty typically involve greater risk, but allow for more room for imperfections in execution.
Deliverables
Deliverable |
Due Date |
Project Proposal (2 pages max) |
Sun Nov 17 10pm ET |
Presentation presentation (preliminary results) |
Dec 3 or Dec 5 |
Code (Github repository) |
Th Dec 12, 10pm ET (no late submissions accepted) |
Final report (8 pages max) |
Th Dec 12, 10pm ET (no late submissions accepted) |
Project Proposal
- Your project proposal is a 2-page (maximum) write-up (with full sentences and prose) that answers the following questions:
- Topic. What topic in NLP are you focusing on? Why are you excited about this topic?
- Previous Work. What are the 2 academic NLP papers (minimum) that you are drawing from for your idea, and what are the main ideas behind these papers?
- Proposed Contributions.What is your proposed research contribution? This should be one sentence (analogous to a “thesis statement” when you are writing an English paper).
- Here’s an example from one of Katie’s papers, “We propose a new measure of scholarly jargon by using information-theoretic metrics and BERT-based word sense induction to identify discipline-specific word types and senses.” Your proposed contribution for a class final project can be much smaller in scope.
- Evaluation.How will you evaluate your approach? What data will you evaluate your method on? Provide a link to where the data can be downloaded.
- Submission: 1-2 page document submitted to Gradescope
Project Presentation
You will be assigned a project presentation slot the last week of classes (December 3 or December 5). Each group will present a Google slide deck, and speak for
8 minutes to
- Introduce your project topic
- Describe your research contribution
- Present preliminary results
Note, you do not need to have the full project finished by the time of the presentation, but you should have some initial results that you can present to the class. It's okay if these results change between the time of your presentation and the final report.
Code
Create a shared Github repository that hosts the code for you final project. Use best practices with documentation (docstrings) and object-oriented programming that have been demonstrated in the class homeworks.
Submission: Please add Katie (Github username: kakeith) to repository. Also include a URL link to the repository in your final report.
Final report
The final report will summarize all your contributions for the project and give you practice writing a technical research paper. Note, this final report will be graded holistically along with your code to assign difficulty and execution scores (described above), but here's a rubric for the baseline expectations:
- The report must be submitted as a pdf to Gradescope. You must write your final paper in Latex. You must use the ACL 2023 latex or Overleaf (shared Latex editor) template (5% adherence to style requirements).
- The final report must be a minimum of 5 pages and maximum 8 pages (10%). References do not count towards this page limit.
- Projects must adequately cite at least [3 x number of people in group] research articles. (10%).
- The report must consist of the following sections (5% adherence to section requirements):
- Abstract.
At a high level, summarize what your problem is, what methods you used, and your results. An abstract that is shorter and more concise is better.
- Introduction.
What is your problem? Why is it important?
- Related work.
What have people previously done in regards to your problem? What work is related?
- Dataset.
Describe what dataset(s) you are using, where these came from, and some basic properties of the dataset.
- Methods.
What methods are you using? What NLP models are you using and/or modifying and why?
- Results.
What are your results? How did you evaluate these results?
- Ethics & Limitations.
What are the ethical considerations? What are limitations of your method or approach?
- Conclusion.
What can you conclude from your project? What did you learn? What are future directions?
- References (bibliography).
List citations here. Use the ACL style file for examples of how to cite certain works.
- Additionally, the report must also contain the following:
- At least two original figures. These figures could show the results, interesting analyses, exploration of the features, an overview of the data and modeling pipeline, etc. (20%)
- At least one original table. The table could consist of data statistics, results, etc. (10%)
- Writing mechanics: grammar and typos (10%).
- Writing clarity: high-level writing style and arguments conveyed effectively (10%).
- Mastery of NLP concepts (30%). Evaluation on this category could include (but is not limited to):
- Proper train-test split (or train-dev-test split or cross-validation) with proper data-driven selection of hyperparameters.
- Comparing models against a baseline (e.g. predicting the majority class)
- Comparing more than one NLP model
- Proper use of a machine learning / NLP package (e.g. sklearn or Pytorch) or development of a new NLP model.
Submission: Each group will submit a single .pdf to Gradescope. No late submissions will be accepted.