Tables

The next cell has some “boiler plate” code. You’ll find it at the top of all notebooks in CS 104. Always run this cell. It sets up our notebook environment to have access to the libraries and resources used in the rest of the code.

from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

Python Review

Expressions

24
24
24 * 7
168
24 * 60 * (60 + 5 - 3 * 2)
84960
# two to the power of four: 2 * 2 * 2 * 2
2 ** 4
16
'hello'
'hello'

Variables

Terminology: expressions, values, names, variables, statements

Naming rules: letters, numbers, underscores; case sensitive; start with letter usually

Calculate the number of seconds in a year.

60 * 60 * 24 * 365
31536000
seconds_per_year = 60 * 60 * 24 * 365
seconds_per_year
31536000
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year
seconds_per_year
31536000

Functions

Terminology: function, call, argument

abs(-5)
5
day_temp = 52
night_temp = 47
abs(night_temp - day_temp)
5
max(3, 4, 6, -2 , 1, 0)
6
y = max(3, 4)
y
4
round(123.456, 1)
123.5

Tables

Tables are stored as CSV (Comma separated values) files. Have a look!

Hopkin’s Forest Tree Surveys

Trees and more here: https://hmf.williams.edu/researchacademics/data/

trees = Table.read_table('data/hopkins-plot-0011.csv')
trees
genus species common name count
Acer pensylvanicum Maple, striped 24
Acer rubrum Maple, red 20
Acer saccharum Maple, sugar 2
Betula alleghaniensis Birch, yellow 7
Betula lenta Birch, black 2
Betula papyrifera Birch, paper 2
Fagus grandifolia Beech, American 125
Quercus rubra Oak, red 1
trees.show(3)
genus species common name count
Acer pensylvanicum Maple, striped 24
Acer rubrum Maple, red 20
Acer saccharum Maple, sugar 2

... (5 rows omitted)

trees.show()
genus species common name count
Acer pensylvanicum Maple, striped 24
Acer rubrum Maple, red 20
Acer saccharum Maple, sugar 2
Betula alleghaniensis Birch, yellow 7
Betula lenta Birch, black 2
Betula papyrifera Birch, paper 2
Fagus grandifolia Beech, American 125
Quercus rubra Oak, red 1

Table Operations: Selecting Columns

Terminology: method

trees.select('genus')
genus
Acer
Acer
Acer
Betula
Betula
Betula
Fagus
Quercus
trees.select('common name', 'count')
common name count
Maple, striped 24
Maple, red 20
Maple, sugar 2
Birch, yellow 7
Birch, black 2
Birch, paper 2
Beech, American 125
Oak, red 1
# trees.select(genus, 'count')  # Error!
trees.drop('species')
genus common name count
Acer Maple, striped 24
Acer Maple, red 20
Acer Maple, sugar 2
Betula Birch, yellow 7
Betula Birch, black 2
Betula Birch, paper 2
Fagus Beech, American 125
Quercus Oak, red 1
trees  # Still the same...
genus species common name count
Acer pensylvanicum Maple, striped 24
Acer rubrum Maple, red 20
Acer saccharum Maple, sugar 2
Betula alleghaniensis Birch, yellow 7
Betula lenta Birch, black 2
Betula papyrifera Birch, paper 2
Fagus grandifolia Beech, American 125
Quercus rubra Oak, red 1
# must rename the variable if want to save the results of a method applied to the Table
trees_without_species = trees.drop('species')
trees_without_species
genus common name count
Acer Maple, striped 24
Acer Maple, red 20
Acer Maple, sugar 2
Betula Birch, yellow 7
Betula Birch, black 2
Betula Birch, paper 2
Fagus Beech, American 125
Quercus Oak, red 1

Table Operations: Sorting

Terminology: named argument

trees_without_species.sort('count')
genus common name count
Quercus Oak, red 1
Acer Maple, sugar 2
Betula Birch, black 2
Betula Birch, paper 2
Betula Birch, yellow 7
Acer Maple, red 20
Acer Maple, striped 24
Fagus Beech, American 125
trees_without_species.sort('count', descending=True) # named argument is "descending" 
genus common name count
Fagus Beech, American 125
Acer Maple, striped 24
Acer Maple, red 20
Betula Birch, yellow 7
Acer Maple, sugar 2
Betula Birch, black 2
Betula Birch, paper 2
Quercus Oak, red 1
trees_without_species.sort('genus', descending=True)
genus common name count
Quercus Oak, red 1
Fagus Beech, American 125
Betula Birch, yellow 7
Betula Birch, black 2
Betula Birch, paper 2
Acer Maple, striped 24
Acer Maple, red 20
Acer Maple, sugar 2

Tying It All Together: Gifts to President Obama and Family

Terminology: method chaining is when the outputs from a previous method are used as inputs to the next method (in a single line).

# derived from: https://raw.githubusercontent.com/tacookson/data/master/us-government-gifts/gifts.csv

gifts = Table.read_table('data/obama-gifts-2010.csv')

# or for all 8 years:  gifts = Table.read_table('data/obama-gifts.csv')
gifts.show(15)
donor_country value_usd gift_description
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Spain 660 6.25'' x 6.25'' silver plate engraved with royal signatu ...
nan 1410 30'' x 54'' multicolored Tibetan scroll depicting a Budd ...
El Salvador 1000 Four piece metal sculpture of Don Quixote and Sancho Pan ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
Ireland 1545 Anthology of Irish poetry, signed by six Irish poets. Re ...
Afghanistan 2000 312'' x 117'' burgundy and red carpet with intricate pat ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
Russia 2080 Two 4'' Imperial porcelain painted eggs with porcelain b ...
Czechia 1100 14'' hand-blown Moser glass plate with 24- carat gold bo ...
Nigeria 8315 Chris Aire gold cufflinks with centered citrine gemstone ...
Singapore 380 Creative Aurvana Live! black headphones. Creative pocket ...
Switzerland 2604 Longines silver-colored watch with brown band, presented ...
China 400 12.5'' Jun purple and gray porcelain vase with a round w ...
Kazakhstan 3200 9.5'' bowl made of silver, jasper, kahalong, and fianit, ...

... (79 rows omitted)

gifts.sort('value_usd')
donor_country value_usd gift_description
Japan 1.325 5'' titanium ``SUSGallery'' cup. 4'' titanium ``SUSGalle ...
Senegal 345 40'' x 70'' black wool shawl with colorful stripes. 16'' ...
Laos 350 20.75'' x 32'' framed silver-colored metal panel with an ...
Egypt 356 4.5'' x 3.5'' Bible in Arabic with metallic cross on cov ...
Ireland 357 Two first edition volumes of William Yates books, publis ...
Ukraine 360 18'' x 62'' embroidered beige tapestry with red floral m ...
Ukraine 360 White blouse with beaded floral design, in a maroon velv ...
Senegal 375 14'' x 9'' x 4'' Le Darmanco coral and navy blue handbag ...
Singapore 380 Creative Aurvana Live! black headphones. Creative pocket ...
Egypt 380 24'' orange and black beaded silver-colored necklace wit ...

... (84 rows omitted)

Terminology: named argument

gifts.sort('value_usd', descending=True)
donor_country value_usd gift_description
Nigeria 8315 Chris Aire gold cufflinks with centered citrine gemstone ...
India 7404 Book, title: ``The Life of Mahatma Gandhi,'' by Louis Fi ...
United Arab Emirates 6125 Cartier 18-carat white gold pearl earrings with diamonds ...
Saudi Arabia 5255 Three Bijan silk tie sets with matching handkerchiefs in ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
Lebanon 4520 32.25'' x 24.5'' watercolor of Arabic calligraphy by art ...
Russia 4000 83'' x 43'' framed oil painting of a carnival winter sce ...
United Kingdom 4000 39'' x 31'' canvas of graffiti letters spray- painted on ...
Spain 3600 20'' gold Pina & Grau necklace with 1'' Mallorcan cross ...
Kazakhstan 3200 9.5'' bowl made of silver, jasper, kahalong, and fianit, ...

... (84 rows omitted)

sorted_by_value = gifts.sort('value_usd', descending=True)
sorted_by_value
donor_country value_usd gift_description
Nigeria 8315 Chris Aire gold cufflinks with centered citrine gemstone ...
India 7404 Book, title: ``The Life of Mahatma Gandhi,'' by Louis Fi ...
United Arab Emirates 6125 Cartier 18-carat white gold pearl earrings with diamonds ...
Saudi Arabia 5255 Three Bijan silk tie sets with matching handkerchiefs in ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
Lebanon 4520 32.25'' x 24.5'' watercolor of Arabic calligraphy by art ...
Russia 4000 83'' x 43'' framed oil painting of a carnival winter sce ...
United Kingdom 4000 39'' x 31'' canvas of graffiti letters spray- painted on ...
Spain 3600 20'' gold Pina & Grau necklace with 1'' Mallorcan cross ...
Kazakhstan 3200 9.5'' bowl made of silver, jasper, kahalong, and fianit, ...

... (84 rows omitted)

sorted_by_value.sort('donor_country').show(30)
donor_country value_usd gift_description
Afghanistan 2000 312'' x 117'' burgundy and red carpet with intricate pat ...
Afghanistan 2000 80'' x 148'' brown and blue patterned rug. Rec'd--5/12/2 ...
Armenia 2985 16'' x 16'' walnut chessboard with walnut and sterling s ...
Austria 625 32'' x 70'' Dries Van Noten patterned green silk shawl. ...
Azerbaijan 940 Set of two brass and crystal tea cups, presented in a bl ...
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Brunei 1932 ``Lustre'' necklace with 1'' x 1'' inscribed ``Globe Cre ...
Canada 1000 17'' sugar maple-wood bowl signed by Don Thur. 24'' x 16 ...
China 1400 White ``Flying Pigeon'' electric bicycle. Rec'd--9/27/20 ...
China 629 11'' x 9'' ``Expo 2010 Shanghai China'' blue vase with a ...
China 400 12.5'' Jun purple and gray porcelain vase with a round w ...
Colombia 525 8'' x 14'' Helena Caballero blue purse. Rec'd--9/24/2010 ...
Croatia 510 Gena black three-piece suit Rec'd--4/26/2010. Est. Value ...
Czechia 1100 14'' hand-blown Moser glass plate with 24- carat gold bo ...
Egypt 380 24'' orange and black beaded silver-colored necklace wit ...
Egypt 356 4.5'' x 3.5'' Bible in Arabic with metallic cross on cov ...
El Salvador 1000 Four piece metal sculpture of Don Quixote and Sancho Pan ...
Finland 445 9.5'' blue glass bird, hand-blown by Iittala craftsmen. ...
France 4580 13.67'' Daum France crystal ultraviolet vase. Daum Franc ...
France 2150 Three bottles of 1961 Chateau Bages wine. Two 9'' x 13'' ...
Haiti 2094 VeVe Collections teal handbag with sequins. Book, title: ...
Haiti 425 20'' x 16.5'' painting of Haitian women dressed in white ...
Honduras 560 Two gold-colored Milla Guirst Joyeros cufflinks with May ...
India 7404 Book, title: ``The Life of Mahatma Gandhi,'' by Louis Fi ...
India 2110 Two 13'' x 17'' black and red scarves. 34'' gold-colored ...
India 1015 30'' x 42'' painting of bodhisattva on black and gold-co ...
India 710 Two 10'' x 12'' miniature paintings depicting a small in ...
India 512 27'' x 78'' Kashmir Loom blue cashmere stole with orange ...
India 460 7'' long silver-colored jewelry box with bird designs, i ...
Indonesia 1658 Two books. 36'' x 42'' blue Lenan Pearl of Silk scarf wi ...

... (64 rows omitted)

sorted_by_value.sort('donor_country', distinct=True)
donor_country value_usd gift_description
Afghanistan 2000 312'' x 117'' burgundy and red carpet with intricate pat ...
Armenia 2985 16'' x 16'' walnut chessboard with walnut and sterling s ...
Austria 625 32'' x 70'' Dries Van Noten patterned green silk shawl. ...
Azerbaijan 940 Set of two brass and crystal tea cups, presented in a bl ...
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Brunei 1932 ``Lustre'' necklace with 1'' x 1'' inscribed ``Globe Cre ...
Canada 1000 17'' sugar maple-wood bowl signed by Don Thur. 24'' x 16 ...
China 1400 White ``Flying Pigeon'' electric bicycle. Rec'd--9/27/20 ...
Colombia 525 8'' x 14'' Helena Caballero blue purse. Rec'd--9/24/2010 ...
Croatia 510 Gena black three-piece suit Rec'd--4/26/2010. Est. Value ...

... (44 rows omitted)

Visualizing Categorical Data

top_per_country = sorted_by_value.sort('donor_country', distinct=True)
top_per_country
donor_country value_usd gift_description
Afghanistan 2000 312'' x 117'' burgundy and red carpet with intricate pat ...
Armenia 2985 16'' x 16'' walnut chessboard with walnut and sterling s ...
Austria 625 32'' x 70'' Dries Van Noten patterned green silk shawl. ...
Azerbaijan 940 Set of two brass and crystal tea cups, presented in a bl ...
Bahrain 430 8'' x 10'' Christolfe silver picture frame with a photog ...
Brunei 1932 ``Lustre'' necklace with 1'' x 1'' inscribed ``Globe Cre ...
Canada 1000 17'' sugar maple-wood bowl signed by Don Thur. 24'' x 16 ...
China 1400 White ``Flying Pigeon'' electric bicycle. Rec'd--9/27/20 ...
Colombia 525 8'' x 14'' Helena Caballero blue purse. Rec'd--9/24/2010 ...
Croatia 510 Gena black three-piece suit Rec'd--4/26/2010. Est. Value ...

... (44 rows omitted)

# Bar chart: visualize categorical data
top_per_country.barh('donor_country', 'value_usd')
../_images/03-tables_56_0.png
# Bar chart: visualize categorical data
top_per_country.sort('value_usd', descending=True).barh('donor_country', 'value_usd')
../_images/03-tables_57_0.png

Table Operations: Selecting Rows

majors = Table.read_table("data/majors.csv")
majors
Major Division 2008-2012 2018-2021
American Studies 2 10 9
Anthropology 2 8 4
Arabic Studies 1 4 7
Art 1 55 31
Asian Studies 1 8 6
Astronomy 3 1 2
Astrophysics 3 3 3
Biology 3 58 61
Chemistry 3 30 34
Chinese 1 11 9

... (27 rows omitted)

majors.sort("2018-2021", descending=True)
Major Division 2008-2012 2018-2021
Economics 2 84 105
Biology 3 58 61
Mathematics 3 53 61
Political Science 2 58 54
Computer Science 3 16 50
English 1 65 50
Psychology 3 62 45
History 2 61 43
Chemistry 3 30 34
Art 1 55 31

... (27 rows omitted)

majors.where("Division", are.equal_to(3))
majors.where("Division", 3)  # same
Major Division 2008-2012 2018-2021
Astronomy 3 1 2
Astrophysics 3 3 3
Biology 3 58 61
Chemistry 3 30 34
Computer Science 3 16 50
Geosciences 3 7 12
Mathematics 3 53 61
Physics 3 12 13
Psychology 3 62 45
Statistics 3 0 16

Digression: Bar Charts for Multiple Variables

div3 = majors.where("Division", 3)
div3.drop("Division").barh("Major")
../_images/03-tables_63_0.png

Terminology: method chaining

majors.where("Division", are.equal_to(3)).sort("2018-2021", descending=True)
Major Division 2008-2012 2018-2021
Biology 3 58 61
Mathematics 3 53 61
Computer Science 3 16 50
Psychology 3 62 45
Chemistry 3 30 34
Statistics 3 0 16
Physics 3 12 13
Geosciences 3 7 12
Astrophysics 3 3 3
Astronomy 3 1 2

See complete list of are conditions in our Python Reference.

majors.where("Division", are.not_equal_to(3)).sort("2018-2021", descending=True)
Major Division 2008-2012 2018-2021
Economics 2 84 105
Political Science 2 58 54
English 1 65 50
History 2 61 43
Art 1 55 31
Philosophy 2 19 16
Political Economy 2 16 13
Environmental Studies 2 0 12
Comparative Literature 1 5 11
French 1 6 11

... (17 rows omitted)

majors.where("2018-2021", are.above(30))
Major Division 2008-2012 2018-2021
Art 1 55 31
Biology 3 58 61
Chemistry 3 30 34
Computer Science 3 16 50
Economics 2 84 105
English 1 65 50
History 2 61 43
Mathematics 3 53 61
Political Science 2 58 54
Psychology 3 62 45
majors.where("2018-2021", are.between(10,20))
Major Division 2008-2012 2018-2021
Comparative Literature 1 5 11
Environmental Studies 2 0 12
French 1 6 11
Geosciences 3 7 12
Philosophy 2 19 16
Physics 3 12 13
Political Economy 2 16 13
Sociology 2 9 11
Spanish 1 8 10
Statistics 3 0 16
majors.where("Major", are.containing('ics'))
Major Division 2008-2012 2018-2021
Astrophysics 3 3 3
Classics 1 5 6
Economics 2 84 105
Mathematics 3 53 61
Physics 3 12 13
Statistics 3 0 16

What else can we learn from these data sets?