Tables
Contents
Tables#
The next cell has some “boiler plate” code. You’ll find it at the top of all notebooks in CS 104. Always run this cell. It sets up our notebook environment to have access to the libraries and resources used in the rest of the code.
from datascience import *
from cs104 import *
import numpy as np
%matplotlib inline
1. Python Review#
Expressions#
Expressions are typically pieces of code that are self-contained and usually evaluate to a value.
24
24
24 * 7
168
24 * 60 * (60 + 5 - 3 * 2)
84960
# two to the power of four: 2 * 2 * 2 * 2
2 ** 4
16
# a hashtag starts a comment
# comments allow us to make notes to ourselves and other humans
# but they are ignored by the computer
'hello'
'hello'
Variables#
Variables hold values for us. We assign a value to a variable with an assignment statement. An assignment statement computes the value on the right hand side of = and changes the meaning of the name to the left of the = symbol to be that value
Naming rules: letters, numbers, underscores; case sensitive; start with letter usually
Calculate the number of seconds in a year.
60 * 60 * 24 * 365
31536000
seconds_per_year = 60 * 60 * 24 * 365
seconds_per_year
31536000
Often, we want to assign intermediate steps to variables so that they’re easier to inspect and debug.
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year
seconds_per_year
31536000
Functions#
A function returns some value, based on its arguments.
abs(-5)
5
day_temp = 52
night_temp = 47
abs(night_temp - day_temp)
5
max(3, 4, 6, -2 , 1, 0)
6
y = max(3, 4)
y
4
round(123.456, 1)
123.5
2. Tables & Obama Gifts#
Tables are stored as CSV (Comma separated values) files. Have a look!
Now we’ll look at data about gifts given to President Obama and his family during 2010 when he was in the White House.
The full data set can be found here.
gifts = Table().read_table('data/obama-gifts-2010.csv')
gifts
donor_country | value_usd | gift_description |
---|---|---|
Bahrain | 430 | 8'' x 10'' Christolfe silver picture frame with a photog ... |
Spain | 660 | 6.25'' x 6.25'' silver plate engraved with royal signatu ... |
nan | 1410 | 30'' x 54'' multicolored Tibetan scroll depicting a Budd ... |
El Salvador | 1000 | Four piece metal sculpture of Don Quixote and Sancho Pan ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Afghanistan | 1500 | 312'' x 117'' burgundy and red carpet with intricate pat ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
Russia | 2080 | Two 4'' Imperial porcelain painted eggs with porcelain b ... |
Czechia | 1100 | 14'' hand-blown Moser glass plate with 24- carat gold bo ... |
... (84 rows omitted)
gifts.show(2)
donor_country | value_usd | gift_description |
---|---|---|
Bahrain | 430 | 8'' x 10'' Christolfe silver picture frame with a photog ... |
Spain | 660 | 6.25'' x 6.25'' silver plate engraved with royal signatu ... |
... (92 rows omitted)
gifts.show(5)
donor_country | value_usd | gift_description |
---|---|---|
Bahrain | 430 | 8'' x 10'' Christolfe silver picture frame with a photog ... |
Spain | 660 | 6.25'' x 6.25'' silver plate engraved with royal signatu ... |
nan | 1410 | 30'' x 54'' multicolored Tibetan scroll depicting a Budd ... |
El Salvador | 1000 | Four piece metal sculpture of Don Quixote and Sancho Pan ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
... (89 rows omitted)
What questions do have about this dataset?
How many gifts were given?
Total USD?
Most expensive gift?
gifts.num_rows
94
gifts.num_columns
3
Table Operations: Selecting and Dropping Columns#
Terminology: method
Select a single column
Errors
Select multiple columns
Methods don’t modify the underlying variable
We’ll stick to real data as much as we can, but sometimes it is handy to start with a small subset of our whole data set to illustrate new concepts. Our tiny_gifts
table contains six rows from the full list of gifts, and we’ll use that to introduce some key concepts. We’ll return to the original table below.
tiny_gifts = Table().read_table('data/tiny-obama-gifts-2010.csv')
tiny_gifts
donor_country | value_usd | gift_description |
---|---|---|
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
tiny_gifts.select('donor_country')
donor_country |
---|
India |
Ireland |
Ireland |
France |
Afghanistan |
France |
Let’s try again, but ask for a column not in the table… This will cause a Python error because what we’re asking for is not present.
tiny_gifts.select('Donor Country') # caps and no _ between the words...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[26], line 1
----> 1 tiny_gifts.select('Donor Country') # caps and no _ between the words...
ValueError: The column "Donor Country" is not in the table. The table contains these columns: donor_country, value_usd, gift_description
The next line will also cause an error because donor
is treated as a variable unless it is surrounded by quotes, eg: 'donor'
.
tiny_gifts.select(donor)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[27], line 1
----> 1 tiny_gifts.select(donor)
NameError: name 'donor' is not defined
tiny_gifts.select("donor_country", "value_usd")
donor_country | value_usd |
---|---|
India | 2110 |
Ireland | 1545 |
Ireland | 357 |
France | 4580 |
Afghanistan | 2000 |
France | 2150 |
tiny_gifts.drop("gift_description")
donor_country | value_usd |
---|---|
India | 2110 |
Ireland | 1545 |
Ireland | 357 |
France | 4580 |
Afghanistan | 2000 |
France | 2150 |
Table operations return new tables#
tiny_gifts
donor_country | value_usd | gift_description |
---|---|---|
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
Calling a method on a table (e.g., drop
) does not change the table – it creates a new one! We need to reassign the vairable if we want to save the results of a method applied to a Table.
gifts_no_description = tiny_gifts.drop("gift_description")
gifts_no_description
donor_country | value_usd |
---|---|
India | 2110 |
Ireland | 1545 |
Ireland | 357 |
France | 4580 |
Afghanistan | 2000 |
France | 2150 |
tiny_gifts # hasn't changed!
donor_country | value_usd | gift_description |
---|---|---|
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
Method Chaining#
Terminology: method chaining is when the outputs from a previous method are used as inputs to the next method (in a single line).
tiny_gifts.drop("gift_description").show(3)
donor_country | value_usd |
---|---|
India | 2110 |
Ireland | 1545 |
Ireland | 357 |
... (3 rows omitted)
Table Operations: Sorting#
tiny_gifts.sort('value_usd')
donor_country | value_usd | gift_description |
---|---|---|
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
We can call named arugments for additional functionality of methods.
tiny_gifts.sort('value_usd', descending=True)
donor_country | value_usd | gift_description |
---|---|---|
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
Let’s sort by another column now.
tiny_gifts.sort('donor_country')
donor_country | value_usd | gift_description |
---|---|---|
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
tiny_gifts.sort('value_usd')
donor_country | value_usd | gift_description |
---|---|---|
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
tiny_gifts.sort('donor_country')
donor_country | value_usd | gift_description |
---|---|---|
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
France | 2150 | Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ... |
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Ireland | 357 | Two first edition volumes of William Yates books, publis ... |
How can we get the one most expensive gift for each country?
tiny_gifts.sort('value_usd', descending=True).sort('donor_country', distinct=True)
donor_country | value_usd | gift_description |
---|---|---|
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
India | 2110 | Two 13'' x 17'' black and red scarves. 34'' gold-colored ... |
Ireland | 1545 | Anthology of Irish poetry, signed by six Irish poets. Re ... |
Now back to the full data set!!!
top_gifts = gifts.sort('value_usd', descending=True).sort('donor_country', distinct=True)
top_gifts
donor_country | value_usd | gift_description |
---|---|---|
Afghanistan | 2000 | 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ... |
Armenia | 2985 | 16'' x 16'' walnut chessboard with walnut and sterling s ... |
Austria | 625 | 32'' x 70'' Dries Van Noten patterned green silk shawl. ... |
Azerbaijan | 940 | Set of two brass and crystal tea cups, presented in a bl ... |
Bahrain | 430 | 8'' x 10'' Christolfe silver picture frame with a photog ... |
Brunei | 1932 | ``Lustre'' necklace with 1'' x 1'' inscribed ``Globe Cre ... |
Canada | 1000 | 17'' sugar maple-wood bowl signed by Don Thur. 24'' x 16 ... |
China | 1400 | White ``Flying Pigeon'' electric bicycle. Rec'd--9/27/20 ... |
Colombia | 525 | 8'' x 14'' Helena Caballero blue purse. Rec'd--9/24/2010 ... |
Croatia | 510 | Gena black three-piece suit Rec'd--4/26/2010. Est. Value ... |
... (44 rows omitted)
Sorted by most expensive?
top_gifts.sort('value_usd', descending=True)
donor_country | value_usd | gift_description |
---|---|---|
Nigeria | 8315 | Chris Aire gold cufflinks with centered citrine gemstone ... |
India | 7404 | Book, title: ``The Life of Mahatma Gandhi,'' by Louis Fi ... |
United Arab Emirates | 6125 | Cartier 18-carat white gold pearl earrings with diamonds ... |
Saudi Arabia | 5255 | Three Bijan silk tie sets with matching handkerchiefs in ... |
France | 4580 | 13.67'' Daum France crystal ultraviolet vase. Daum Franc ... |
Lebanon | 4520 | 32.25'' x 24.5'' watercolor of Arabic calligraphy by art ... |
Russia | 4000 | 83'' x 43'' framed oil painting of a carnival winter sce ... |
United Kingdom | 4000 | 39'' x 31'' canvas of graffiti letters spray- painted on ... |
Spain | 3600 | 20'' gold Pina & Grau necklace with 1'' Mallorcan cross ... |
Kazakhstan | 3200 | 9.5'' bowl made of silver, jasper, kahalong, and fianit, ... |
... (44 rows omitted)
3. Williams Majors#
Here’s a new dataset. Let’s explore how the number of different majors have changed over time!
majors = Table().read_table("data/majors.csv")
majors
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
American Studies | 2 | 10 | 9 |
Anthropology | 2 | 8 | 4 |
Arabic Studies | 1 | 4 | 7 |
Art | 1 | 55 | 31 |
Asian Studies | 1 | 8 | 6 |
Astronomy | 3 | 1 | 2 |
Astrophysics | 3 | 3 | 3 |
Biology | 3 | 58 | 61 |
Chemistry | 3 | 30 | 34 |
Chinese | 1 | 11 | 9 |
... (27 rows omitted)
majors.sort("2018-2021", descending=True)
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
Economics | 2 | 84 | 105 |
Biology | 3 | 58 | 61 |
Mathematics | 3 | 53 | 61 |
Political Science | 2 | 58 | 54 |
Computer Science | 3 | 16 | 50 |
English | 1 | 65 | 50 |
Psychology | 3 | 62 | 45 |
History | 2 | 61 | 43 |
Chemistry | 3 | 30 | 34 |
Art | 1 | 55 | 31 |
... (27 rows omitted)
Table Operations: Selecting Rows#
div3 = majors.where("Division", are.equal_to(3))
div3
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
Astronomy | 3 | 1 | 2 |
Astrophysics | 3 | 3 | 3 |
Biology | 3 | 58 | 61 |
Chemistry | 3 | 30 | 34 |
Computer Science | 3 | 16 | 50 |
Geosciences | 3 | 7 | 12 |
Mathematics | 3 | 53 | 61 |
Physics | 3 | 12 | 13 |
Psychology | 3 | 62 | 45 |
Statistics | 3 | 0 | 16 |
div3 = div3.drop("Division")
div3
Major | 2008-2012 | 2018-2021 |
---|---|---|
Astronomy | 1 | 2 |
Astrophysics | 3 | 3 |
Biology | 58 | 61 |
Chemistry | 30 | 34 |
Computer Science | 16 | 50 |
Geosciences | 7 | 12 |
Mathematics | 53 | 61 |
Physics | 12 | 13 |
Psychology | 62 | 45 |
Statistics | 0 | 16 |
Bar Charts to Visualize Categorical Variables#
We can create a bar chart directly from a table by the method barh
.
The first argument (categorical variable) in this method will appear on the vertical axis.
The second (numerical variable) will appear on the horizontal axis.
div3.barh("Major", "2008-2012")

Digression: Bar Charts for Multiple Variables#
div3.barh("Major")

Additional where conditions#
See complete list of are
conditions in our Python Reference.
majors.where("Division", are.not_equal_to(3)).sort("2018-2021", descending=True)
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
Economics | 2 | 84 | 105 |
Political Science | 2 | 58 | 54 |
English | 1 | 65 | 50 |
History | 2 | 61 | 43 |
Art | 1 | 55 | 31 |
Philosophy | 2 | 19 | 16 |
Political Economy | 2 | 16 | 13 |
Environmental Studies | 2 | 0 | 12 |
Comparative Literature | 1 | 5 | 11 |
French | 1 | 6 | 11 |
... (17 rows omitted)
majors.where("2018-2021", are.above(30))
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
Art | 1 | 55 | 31 |
Biology | 3 | 58 | 61 |
Chemistry | 3 | 30 | 34 |
Computer Science | 3 | 16 | 50 |
Economics | 2 | 84 | 105 |
English | 1 | 65 | 50 |
History | 2 | 61 | 43 |
Mathematics | 3 | 53 | 61 |
Political Science | 2 | 58 | 54 |
Psychology | 3 | 62 | 45 |
majors.where("2018-2021", are.between(10,20))
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
Comparative Literature | 1 | 5 | 11 |
Environmental Studies | 2 | 0 | 12 |
French | 1 | 6 | 11 |
Geosciences | 3 | 7 | 12 |
Philosophy | 2 | 19 | 16 |
Physics | 3 | 12 | 13 |
Political Economy | 2 | 16 | 13 |
Sociology | 2 | 9 | 11 |
Spanish | 1 | 8 | 10 |
Statistics | 3 | 0 | 16 |
majors.where("Major", are.containing('ics'))
Major | Division | 2008-2012 | 2018-2021 |
---|---|---|---|
Astrophysics | 3 | 3 | 3 |
Classics | 1 | 5 | 6 |
Economics | 2 | 84 | 105 |
Mathematics | 3 | 53 | 61 |
Physics | 3 | 12 | 13 |
Statistics | 3 | 0 | 16 |