Objectives

The final project for NLP has three primary objectives:

Grading

The final project is 20% of your overall grade. The project will be graded holistically via an additive scoring method inspired by gymnastics:

Deliverables

Deliverable Due Date
Project Proposal (2 pages max) Sun Nov 17 10pm ET
Presentation presentation (preliminary results) Dec 3 or Dec 5
Code (Github repository) Th Dec 12, 10pm ET (no late submissions accepted)
Final report (8 pages max) Th Dec 12, 10pm ET (no late submissions accepted)

Project Proposal

Project Presentation

You will be assigned a project presentation slot the last week of classes (December 3 or December 5). Each group will present a Google slide deck, and speak for 8 minutes to Note, you do not need to have the full project finished by the time of the presentation, but you should have some initial results that you can present to the class. It's okay if these results change between the time of your presentation and the final report.

Code

Create a shared Github repository that hosts the code for you final project. Use best practices with documentation (docstrings) and object-oriented programming that have been demonstrated in the class homeworks.
  • Submission: Please add Katie (Github username: kakeith) to repository. Also include a URL link to the repository in your final report.
  • Final report

    The final report will summarize all your contributions for the project and give you practice writing a technical research paper. Note, this final report will be graded holistically along with your code to assign difficulty and execution scores (described above), but here's a rubric for the baseline expectations:
    1. The report must be submitted as a pdf to Gradescope. You must write your final paper in Latex. You must use the ACL 2023 latex or Overleaf (shared Latex editor) template (5% adherence to style requirements).
    2. The final report must be a minimum of 5 pages and maximum 8 pages (10%). References do not count towards this page limit.
    3. Projects must adequately cite at least [3 x number of people in group] research articles. (10%).
    4. The report must consist of the following sections (5% adherence to section requirements):
      1. Abstract. At a high level, summarize what your problem is, what methods you used, and your results. An abstract that is shorter and more concise is better.
      2. Introduction. What is your problem? Why is it important?
      3. Related work. What have people previously done in regards to your problem? What work is related?
      4. Dataset. Describe what dataset(s) you are using, where these came from, and some basic properties of the dataset.
      5. Methods. What methods are you using? What NLP models are you using and/or modifying and why?
      6. Results. What are your results? How did you evaluate these results?
      7. Ethics & Limitations. What are the ethical considerations? What are limitations of your method or approach?
      8. Conclusion. What can you conclude from your project? What did you learn? What are future directions?
      9. References (bibliography). List citations here. Use the ACL style file for examples of how to cite certain works.
    5. Additionally, the report must also contain the following:
      1. At least two original figures. These figures could show the results, interesting analyses, exploration of the features, an overview of the data and modeling pipeline, etc. (20%)
      2. At least one original table. The table could consist of data statistics, results, etc. (10%)
    6. Writing mechanics: grammar and typos (10%).
    7. Writing clarity: high-level writing style and arguments conveyed effectively (10%).
    8. Mastery of NLP concepts (30%). Evaluation on this category could include (but is not limited to):
      1. Proper train-test split (or train-dev-test split or cross-validation) with proper data-driven selection of hyperparameters.
      2. Comparing models against a baseline (e.g. predicting the majority class)
      3. Comparing more than one NLP model
      4. Proper use of a machine learning / NLP package (e.g. sklearn or Pytorch) or development of a new NLP model.
  • Submission: Each group will submit a single .pdf to Gradescope. No late submissions will be accepted.