Tables#

The next cell has some “boiler plate” code. You’ll find it at the top of all notebooks in CS 104. Always run this cell. It sets up our notebook environment to have access to the libraries and resources used in the rest of the code.

from datascience import *
from cs104 import *
import numpy as np

%matplotlib inline

1. Python Review#

Expressions#

Expressions are typically pieces of code that are self-contained and usually evaluate to a value.

24
24
24 * 7
168
24 * 60 * (60 + 5 - 3 * 2)
84960
# two to the power of four: 2 * 2 * 2 * 2
2 ** 4
16
# a hashtag starts a comment 
# comments allow us to make notes to ourselves and other humans
# but they are ignored by the computer
'hello'
'hello'

Variables#

Variables hold values for us. We assign a value to a variable with an assignment statement. An assignment statement computes the value on the right hand side of = and changes the meaning of the name to the left of the = symbol to be that value

Naming rules: letters, numbers, underscores; case sensitive; start with letter usually

Calculate the number of seconds in a year.

60 * 60 * 24 * 365
31536000
seconds_per_year = 60 * 60 * 24 * 365
seconds_per_year
31536000

Often, we want to assign intermediate steps to variables so that they’re easier to inspect and debug.

seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year
seconds_per_year
31536000

Functions#

A function returns some value, based on its arguments.

abs(-5)
5
day_temp = 52
night_temp = 47
abs(night_temp - day_temp)
5
max(3, 4, 6, -2 , 1, 0)
6
y = max(3, 4)
y
4
round(123.456, 1)
123.5

2. Tables & Obama Gifts#

Tables are stored as CSV (Comma separated values) files. Have a look!

Now we’ll look at data about gifts given to President Obama and his family during 2010 when he was in the White House.

The full data set can be found here.

gifts = Table().read_table('data/obama-gifts-2010.csv')
gifts
donor_country value_usd gift_description
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Spain 660 6.25'' x 6.25'' silver plate engraved with royal signatu ...
nan 1410 30'' x 54'' multicolored Tibetan scroll depicting a Budd ...
El Salvador 1000 Four piece metal sculpture of Don Quixote and Sancho Pan ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Afghanistan 1500 312'' x 117'' burgundy and red carpet with intricate pat ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
Russia 2080 Two 4'' Imperial porcelain painted eggs with porcelain b ...
Czechia 1100 14'' hand-blown Moser glass plate with 24- carat gold bo ...

... (84 rows omitted)

gifts.show(2)
donor_country value_usd gift_description
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Spain 660 6.25'' x 6.25'' silver plate engraved with royal signatu ...

... (92 rows omitted)

gifts.show(5) 
donor_country value_usd gift_description
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Spain 660 6.25'' x 6.25'' silver plate engraved with royal signatu ...
nan 1410 30'' x 54'' multicolored Tibetan scroll depicting a Budd ...
El Salvador 1000 Four piece metal sculpture of Don Quixote and Sancho Pan ...
Ireland 357 Two first edition volumes of William Yates books, publis ...

... (89 rows omitted)

What questions do have about this dataset?

  • How many gifts were given?

  • Total USD?

  • Most expensive gift?

gifts.num_rows
94
gifts.num_columns
3

Table Operations: Selecting and Dropping Columns#

Terminology: method

  • Select a single column

  • Errors

  • Select multiple columns

  • Methods don’t modify the underlying variable

We’ll stick to real data as much as we can, but sometimes it is handy to start with a small subset of our whole data set to illustrate new concepts. Our tiny_gifts table contains six rows from the full list of gifts, and we’ll use that to introduce some key concepts. We’ll return to the original table below.

tiny_gifts = Table().read_table('data/tiny-obama-gifts-2010.csv')
tiny_gifts
donor_country value_usd gift_description
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
tiny_gifts.select('donor_country')
donor_country
India
Ireland
Ireland
France
Afghanistan
France

Let’s try again, but ask for a column not in the table… This will cause a Python error because what we’re asking for is not present.

tiny_gifts.select('Donor Country')   # caps and no _ between the words...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 1
----> 1 tiny_gifts.select('Donor Country')   # caps and no _ between the words...

ValueError: The column "Donor Country" is not in the table. The table contains these columns: donor_country, value_usd, gift_description

The next line will also cause an error because donor is treated as a variable unless it is surrounded by quotes, eg: 'donor'.

tiny_gifts.select(donor)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 tiny_gifts.select(donor)

NameError: name 'donor' is not defined
tiny_gifts.select("donor_country", "value_usd")
donor_country value_usd
India 2110
Ireland 1545
Ireland 357
France 4580
Afghanistan 2000
France 2150
tiny_gifts.drop("gift_description")
donor_country value_usd
India 2110
Ireland 1545
Ireland 357
France 4580
Afghanistan 2000
France 2150

Table operations return new tables#

tiny_gifts
donor_country value_usd gift_description
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...

Calling a method on a table (e.g., drop) does not change the table – it creates a new one! We need to reassign the vairable if we want to save the results of a method applied to a Table.

gifts_no_description = tiny_gifts.drop("gift_description")
gifts_no_description
donor_country value_usd
India 2110
Ireland 1545
Ireland 357
France 4580
Afghanistan 2000
France 2150
tiny_gifts  # hasn't changed!
donor_country value_usd gift_description
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...

Method Chaining#

Terminology: method chaining is when the outputs from a previous method are used as inputs to the next method (in a single line).

tiny_gifts.drop("gift_description").show(3)
donor_country value_usd
India 2110
Ireland 1545
Ireland 357

... (3 rows omitted)

Table Operations: Sorting#

tiny_gifts.sort('value_usd')
donor_country value_usd gift_description
Ireland 357 Two first edition volumes of William Yates books, publis ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...

We can call named arugments for additional functionality of methods.

tiny_gifts.sort('value_usd', descending=True)
donor_country value_usd gift_description
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Ireland 357 Two first edition volumes of William Yates books, publis ...

Let’s sort by another column now.

tiny_gifts.sort('donor_country')
donor_country value_usd gift_description
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
tiny_gifts.sort('value_usd')
donor_country value_usd gift_description
Ireland 357 Two first edition volumes of William Yates books, publis ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
tiny_gifts.sort('donor_country')
donor_country value_usd gift_description
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Ireland 357 Two first edition volumes of William Yates books, publis ...

How can we get the one most expensive gift for each country?

tiny_gifts.sort('value_usd', descending=True).sort('donor_country', distinct=True)
donor_country value_usd gift_description
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...

Now back to the full data set!!!

top_gifts = gifts.sort('value_usd', descending=True).sort('donor_country', distinct=True)
top_gifts
donor_country value_usd gift_description
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
Armenia 2985 16'' x 16'' walnut chessboard with walnut and sterling s ...
Austria 625 32'' x 70'' Dries Van Noten patterned green silk shawl. ...
Azerbaijan 940 Set of two brass and crystal tea cups, presented in a bl ...
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Brunei 1932 ``Lustre'' necklace with 1'' x 1'' inscribed ``Globe Cre ...
Canada 1000 17'' sugar maple-wood bowl signed by Don Thur. 24'' x 16 ...
China 1400 White ``Flying Pigeon'' electric bicycle. Rec'd--9/27/20 ...
Colombia 525 8'' x 14'' Helena Caballero blue purse. Rec'd--9/24/2010 ...
Croatia 510 Gena black three-piece suit Rec'd--4/26/2010. Est. Value ...

... (44 rows omitted)

Sorted by most expensive?

top_gifts.sort('value_usd', descending=True)
donor_country value_usd gift_description
Nigeria 8315 Chris Aire gold cufflinks with centered citrine gemstone ...
India 7404 Book, title: ``The Life of Mahatma Gandhi,'' by Louis Fi ...
United Arab Emirates 6125 Cartier 18-carat white gold pearl earrings with diamonds ...
Saudi Arabia 5255 Three Bijan silk tie sets with matching handkerchiefs in ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
Lebanon 4520 32.25'' x 24.5'' watercolor of Arabic calligraphy by art ...
Russia 4000 83'' x 43'' framed oil painting of a carnival winter sce ...
United Kingdom 4000 39'' x 31'' canvas of graffiti letters spray- painted on ...
Spain 3600 20'' gold Pina & Grau necklace with 1'' Mallorcan cross ...
Kazakhstan 3200 9.5'' bowl made of silver, jasper, kahalong, and fianit, ...

... (44 rows omitted)

3. Williams Majors#

Here’s a new dataset. Let’s explore how the number of different majors have changed over time!

majors = Table().read_table("data/majors.csv")
majors
Major Division 2008-2012 2018-2021
American Studies 2 10 9
Anthropology 2 8 4
Arabic Studies 1 4 7
Art 1 55 31
Asian Studies 1 8 6
Astronomy 3 1 2
Astrophysics 3 3 3
Biology 3 58 61
Chemistry 3 30 34
Chinese 1 11 9

... (27 rows omitted)

majors.sort("2018-2021", descending=True)
Major Division 2008-2012 2018-2021
Economics 2 84 105
Biology 3 58 61
Mathematics 3 53 61
Political Science 2 58 54
Computer Science 3 16 50
English 1 65 50
Psychology 3 62 45
History 2 61 43
Chemistry 3 30 34
Art 1 55 31

... (27 rows omitted)

Table Operations: Selecting Rows#

div3 = majors.where("Division", are.equal_to(3))
div3
Major Division 2008-2012 2018-2021
Astronomy 3 1 2
Astrophysics 3 3 3
Biology 3 58 61
Chemistry 3 30 34
Computer Science 3 16 50
Geosciences 3 7 12
Mathematics 3 53 61
Physics 3 12 13
Psychology 3 62 45
Statistics 3 0 16
div3 = div3.drop("Division")
div3
Major 2008-2012 2018-2021
Astronomy 1 2
Astrophysics 3 3
Biology 58 61
Chemistry 30 34
Computer Science 16 50
Geosciences 7 12
Mathematics 53 61
Physics 12 13
Psychology 62 45
Statistics 0 16

Bar Charts to Visualize Categorical Variables#

We can create a bar chart directly from a table by the method barh.

The first argument (categorical variable) in this method will appear on the vertical axis.

The second (numerical variable) will appear on the horizontal axis.

div3.barh("Major", "2008-2012")
../_images/03-tables_75_0.png

Digression: Bar Charts for Multiple Variables#

div3.barh("Major")
../_images/03-tables_77_0.png

Additional where conditions#

See complete list of are conditions in our Python Reference.

majors.where("Division", are.not_equal_to(3)).sort("2018-2021", descending=True)
Major Division 2008-2012 2018-2021
Economics 2 84 105
Political Science 2 58 54
English 1 65 50
History 2 61 43
Art 1 55 31
Philosophy 2 19 16
Political Economy 2 16 13
Environmental Studies 2 0 12
Comparative Literature 1 5 11
French 1 6 11

... (17 rows omitted)

majors.where("2018-2021", are.above(30))
Major Division 2008-2012 2018-2021
Art 1 55 31
Biology 3 58 61
Chemistry 3 30 34
Computer Science 3 16 50
Economics 2 84 105
English 1 65 50
History 2 61 43
Mathematics 3 53 61
Political Science 2 58 54
Psychology 3 62 45
majors.where("2018-2021", are.between(10,20))
Major Division 2008-2012 2018-2021
Comparative Literature 1 5 11
Environmental Studies 2 0 12
French 1 6 11
Geosciences 3 7 12
Philosophy 2 19 16
Physics 3 12 13
Political Economy 2 16 13
Sociology 2 9 11
Spanish 1 8 10
Statistics 3 0 16
majors.where("Major", are.containing('ics'))
Major Division 2008-2012 2018-2021
Astrophysics 3 3 3
Classics 1 5 6
Economics 2 84 105
Mathematics 3 53 61
Physics 3 12 13
Statistics 3 0 16

What else can we learn from these data sets?#