Welcome to CSCI 104!
Contents
Welcome to CSCI 104!#
Fall 2024
Many of the world’s greatest discoveries and most consequential decisions are enabled or informed by the analysis of data from a myriad of sources. Indeed, the ability to wrangle, visualize, and draw conclusions from data is now a critical tool in the sciences, business, medicine, politics, other academic disciplines, and society as a whole.
CSCI 104 course lays the foundations for quantifying relationships in data by exploring complementary computational, statistical, and visualization concepts. These concepts will be reinforced by lab experiences designed to teach programming and statistics skills while analyzing real-world data sets. This course will also examine the broader context and social issues surrounding data analysis, including privacy and ethics.
Is This the Class for You?#
This class studies core computing and data science concepts, using computation as a lens for exploring data manipulation, visualization, and statistics. CSCI 104 is accessible to all students, from first years to seniors, wishing to learn about these topics, regardless of academic background or interests.
No prior computer science, programming, or statistics experience is necessary or expected, and there are no prerequisites.
Format and Topics#
Lectures and weekly labs. Labs meet once per week for 90 minutes. Together with the instructor and TAs, you’ll spend lab time working on assignments using the Python programming language and existing data science tools to apply concepts from lecture to data sets drawn from a wide range of sources. A larger capstone project will also give you an opportunity to explore data of your own choosing.
Topics#
-
Cause and Effect
Do COVID vaccines reduce disease mortality? Is climate change leading to more powerful hurricanes? Does eating chocolate reduce the risk of heart disease? Such questions about cause and effect are at the heart of data science, and we begin the semester by exploring how to approach answering them. -
Representing Data and Data Wrangling
How do we represent data in a computer program? And how do we convert data, or wrangle it, into a form that allows us to answer interesting questions? Assuming no prior knowledge, we develop the programming skills in Python to do just that. -
Data Visualization
How do we present data in a way that lets us identify interesting features or patterns, and how do we communicate our observations to others? Visualization techniques serves both purposes: they enable us to explore data efficiently and effectively, and they aid in explaining our findings. -
Hypothesis Testing, Estimation, and Prediction
Does our observed data support a particular hypothesis? How do we estimate a property of an entire population from just a sample? How do we use past observations to predict future outcomes? These are three key statistical inference questions that we approach via computational techniques. -
Data Science Ethics
Ethical issues surround work in data science. How do we maintain privacy or anonymity? How do we decide which data to use? How do we mitigate unintended consequences of the work we do? We explore these essential ethical questions and others through real-world case studies and discussion throughout the semester.
Real-world Data Sets#
Throughout the semester, we will use real-world data sets drawn from many sources and on many different subjects.
Learning Outcomes#
Learning Outcomes By the end of the course, students will be able to
Apply practical data science and Python programming skills to any domain.
Represent, wrangle, manipulate, and visualize real-world datasets using Python programming.
Implement and apply computational approaches to statistical inference.
Perform simulation to test hypotheses, bootstrapping to estimate confidence intervals, and linear regression to make predictions.
Comments from Previous Students#
You made an incredibly intimidating course feel approachable. It definitely changed how I think about the world, but also myself, as a person able to engage with computer science.
I came into the class with SO little understanding of what code does, statistics, how I could go about learning a coding language, and honestly felt pretty uncomfortable with computers in general. In labs for other classes, I would generally avoid doing any coding/working with data as much as possible. Now, I generally volunteer to do those tasks in a group setting because I have an idea of what I’m getting myself into and it seems sort of fun.
This is definitely a great introductory class that moves at a very manageable pace, but also teaches skills relevant to a lot of different kinds of data analysis. The course material can be applied to any major at Williams.
As a div I major, I loved 104 more than I ever thought I would.
This course could not have been a more welcoming introduction to the discipline. It provided the perfect balance of challenging material and applicable examples, allowing me to push myself while never feeling as though I was in over my head. I could not have been happier with the knowledge and skills I gained from this course.