Final Project#

Key Deadlines#

Due date

Item

Submission Instructions

11/08 @ 10am

Partners Preferences

Fill out this form.

11/17 @ 4pm

Checkpoint

Submit to “Project 2 Checkpoint” in Gradescope

11/17 @ 4pm

Meeting Times

Sign up for project meetings. (We’ll add a link here.)

12/15 @ 4pm

Final Submission

Submit to “Project 2” in Gradescope

Description and guidelines#

  • This project’s learning objectives are to practice and synthesize the data science concepts learned throughout the semester in CSCI 104.

  • You’ll focus on addressing your choice of quantitative questions on real datasets.

  • This is a group project with 2-3 members. All work is to be completed by your group members alone, and all members should contribute equally. If there are any discrepancies, please notify the instructors.

  • Groups will turn in a single Jupyter notebook as their final deliverable. This notebook will be a mix of (1) cells with Python code and (2) English prose in Markdown cells explaining the code, data, visualizations and analyses. All data files should uploaded to the project folder, using the steps from Lab 5.

  • Please produce a notebook that is readily understood by others. Write descriptive prose to describe what you are doing and what you’ve learned from your work. Organize your code well, comment it, use good variable names, etc. Also, present your data well: use descriptive labels for columns in tables, axes on plots, etc.

  • We provide the template notebook here.

  • You may consult the text, your notes, your lab work, our lecture examples, and the web pages associated with the course web page. You may also consult other references, but please clearly cite any sources you use and attribute any code or ideas to where you found them. If use a source on the web, please provide the full URL and any author information you can identify.

Deliverables#

Checkpoint: Draft of Sections 0-3 (Due to Gradescope: 11/17 at 4pm)#

  • Using the provided template notebook, complete a first draft of sections 0-3 in the rubric below.

  • Instructors will provide feedback on these sections to make sure the questions posed and the datasets chosen are on the right track for success.

  • This is a draft to help gauge work after Thanksgiving break, and so these parts can be changed after feedback.

  • Think about which techniques will help you answer your quantitative questions. Also, think about whether your quantitative questions support the requirements that you perform statistical inference as part of your work: hypothesis testing, estimation, or association.

Note: Only one member of your group should submit your noteboook to Gradescope’s “Project 2 Checkpoint” assignment. After you upload your files, there will be an “Add Group Member” button on the right side of the Gradescope webpage – click that and add your partner(2).

Instructor meeting 1 (Takes place: 12/06–12/08)#

  • You will sign up for short meetings between your group members and the instructors.

  • You will give an update about the project. Please use this as a time to chat with the instructors about challenges you are facing.

Instructor meeting 2 (Takes place: 12/12–12/14)#

  • You will sign up for short meetings between your group members and the instructors.

  • You will show us a (nearly) final version of the project

Project final notebook due (Due to Gradescope: 12/15 4pm)#

  • Your final deliverable will be a Jupyter Notebook. Only submit one notebook per group. This should be a mix of (1) cells with Python code and (2) English prose in Markdown cells explaining the code, data, visualizations and analyses.

Final Project Rubric (100 points)#

Component

Requirements

Points

0.

Description of Data

Please tell us where you found the dataset(s) and what they include in general terms.

5

1.

Quantitative questions

Poses at least two quantitative questions about the dataset(s).

5

2.

Loading data

Loads at least one dataset. You may load more than one dataset if they support your overall quantitative questions. Clean the data, as in Lab 5. We provide the same library functions.

5

3.

Descriptive statistics

Uses code and text to provide at least three descriptions of the dataset (e.g. number of rows, mean of one of the columns)

10

4.

Data wrangling

Uses at least two Table methods (e.g. sort, where, take, apply, pivot, join, group) to do something meaningful with the data. Describes (in full English sentences) what those Table methods are doing.

10

5.

Visualizations

Creates at least two visualizations of the dataset (e.g. a scatter and line plot, or two histograms). Describes (using full English sentences) any interesting findings from the visualizations.

10

6.

Statistical Inference

Categories: (A) Hypothesis test (B) Estimation (e.g. confidence intervals via bootstrapping), (C) Association (e.g. correlation or a linear regression line fit from a scatter plot). Correctly completes at least two statistical inference procedures (e.g. a hypothesis test and a bootstrap confidence interval; or two predictions). Discusses (in full English sentences) the implications of the statistical inference procedures.

30

7.

Ethics

Discusses (in full English sentences) at least one possible ethical consideration of using the dataset or doing analysis of the dataset (e.g. the potential harms of using the data from a Consequentialist or Deontologist perspective, which we will cover later in the semester).

5

8.

Conclusions

Describes (in full English sentences) what has been learned and addresses the original quantitative questions.

10

9.

Mastery and creativity

A truly masterful data science project will go above and beyond these minimum requirements and creatively incorporate the concepts we have learned and practiced in this class. Feel free to go beyond the scope of what we have learned in this class if you have completed all other requirements.

10

Total

100