Pivots and Joins

from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action='ignore',category=np.VisibleDeprecationWarning)

Responses from student survey

# We retained an answer if 2+ students gave that answer
mastered_concepts = Table().with_columns(
'Concept', make_array('Table operations', 'Basic Python', 'Navigating Python reference library'), 
'Student Count', make_array(24, 5, 4)
)
mastered_concepts.sort('Student Count', descending=True).barh('Concept', 'Student Count')
plots.title('Mastered');
../_images/11-pivots-and-joins_4_0.png
# We retained an answer if 2+ students gave that answer
struggling_concepts = Table().with_columns(
'Concept', make_array('Functions', 'Histograms', 'Binning', 'Overlaid graphs', 'Combining different concepts', 'Arrays', 'Syntax', 'Visualizations'), 
'Student Count', make_array(7, 9, 2, 3, 7, 3, 3, 2)
)
struggling_concepts.sort('Student Count', descending=True).barh('Concept', 'Student Count')
plots.title('Struggling');
../_images/11-pivots-and-joins_6_0.png
# We retained an answer if 2+ students gave that answer
liked_aspects = Table().with_columns(
'Aspect', make_array('Labs: guidance and structure', 'Pacing', 'Lecture style', 'Group work (think-pair-shares)', 'Great TAs', 'Pre-labs'), 
'Student Count', make_array(13, 19, 2, 4, 2, 2)
)
liked_aspects.sort('Student Count', descending=True).barh('Aspect', 'Student Count')
plots.title('Liked aspects of class');
../_images/11-pivots-and-joins_8_0.png

Review groups

https://media-cldnry.s-nbcnews.com/image/upload/MSNBC/Components/Photo/_new/110118-obama-gift-2p.jpg

“President Barack Obama receives a gift from Saudi King Abdullah at the start of their bilateral meeting in Riyadh, Saudi Arabia, on June 3, 2009. The large gold medallion was among several gifts given that day that were valued at $34,500, the State Department later said” –CBS News

# Read in all gifts, and tidy up table by removing nan values and relabeling columns
all_gifts = Table().read_table('data/obama-gifts.csv')
all_gifts = all_gifts.where('donor_country', are.not_equal_to('nan')) #clean up and remove the nans
all_gifts = all_gifts.select('year_received', 'donor_country', 'value_usd')
all_gifts = all_gifts.relabeled('year_received', 'Year')
all_gifts = all_gifts.relabeled('donor_country', 'Country')
all_gifts = all_gifts.relabeled('value_usd', 'Value')
all_gifts
Year Country Value
2009 Mexico 400
2009 Japan 1495
2009 United Kingdom 16510
2009 Algeria 500
2009 Denmark 388
2009 Russia 415
2009 Germany 760
2009 Czechia 1460
2009 Turkey 1550
2009 Iraq 1850

... (603 rows omitted)

# We'll also create a small subset of our Obama Gifts dataset.  
# It contains 3 countries, 3 years, 7 gifts
gifts = all_gifts.where('Year', are.contained_in([2009,2010,2011,2012]))
gifts = gifts.where('Country', are.contained_in(make_array('Denmark', 'Egypt', 'Finland')))
gifts = gifts.sort('Year')
gifts
Year Country Value
2009 Denmark 388
2009 Egypt 630
2009 Denmark 340
2010 Finland 445
2010 Egypt 356
2010 Egypt 380
2011 Denmark 485
gifts.group('Year')
Year count
2009 3
2010 3
2011 1
gifts.group('Country')
Country count
Denmark 3
Egypt 3
Finland 1
gifts.group('Year', sum)
Year Country sum Value sum
2009 1358
2010 1181
2011 485
all_gifts.group('Year', sum)
Year Country sum Value sum
2009 571164
2010 122626
2011 233730
2012 75307.8
2013 198508
2014 1.40138e+06
2015 1.10529e+06
2016 147903
gifts.group('Country', max)
Country Year max Value max
Denmark 2011 485
Egypt 2010 630
Finland 2010 445
gifts.group(make_array('Year', 'Country'), sum)
Year Country Value sum
2009 Denmark 728
2009 Egypt 630
2010 Egypt 736
2010 Finland 445
2011 Denmark 485
gifts.group(make_array('Year', 'Country'))
Year Country count
2009 Denmark 2
2009 Egypt 1
2010 Egypt 2
2010 Finland 1
2011 Denmark 1

Pivots

Summarize data that has been grouped by two variables in a grid.

gifts
Year Country Value
2009 Denmark 388
2009 Egypt 630
2009 Denmark 340
2010 Finland 445
2010 Egypt 356
2010 Egypt 380
2011 Denmark 485
gifts.pivot('Year', 'Country')
Country 2009 2010 2011
Denmark 2 0 1
Egypt 1 2 0
Finland 0 1 0
gifts.pivot('Country', 'Year')
Year Denmark Egypt Finland
2009 2 1 0
2010 0 2 1
2011 1 0 0
gifts.pivot('Year', 'Country', 'Value', sum)
Country 2009 2010 2011
Denmark 728 0 485
Egypt 630 736 0
Finland 0 445 0

A pivot from our past: Temperatures in Greenland

We spent some time working with our Upernavik climate data.

greenland_climate = Table.read_table('data/climate_upernavik.csv').where('Year', are.above(1876))
greenland_climate = greenland_climate.relabeled('Precipitation (millimeters)', "Precip (mm)")
tidy_greenland = greenland_climate.where('Air temperature (C)', are.not_equal_to(999.99))
tidy_greenland = tidy_greenland.where('Sea level pressure (mbar)', are.not_equal_to(9999.9))
tidy_greenland
Year Month Air temperature (C) Sea level pressure (mbar) Precip (mm)
1877 1 -21.1 994.9 24
1877 2 -26.5 1008.2 26
1877 3 -17.8 1013.4 84
1877 4 -12 1023.6 6
1877 5 -1.7 1019.4 9
1877 6 1.4 1011.8 5
1877 7 4.6 1011.5 9
1877 8 5.2 1019.8 29
1877 9 3 1009 51
1877 10 -2.8 1006.2 23

... (1221 rows omitted)

When plotting line graphs, we needed separate columns for each month. A pivot gives us a table with that format.

temps_by_month = tidy_greenland.pivot('Month', 'Year', 'Air temperature (C)', np.mean)
temps_by_month
Year 1 2 3 4 5 6 7 8 9 10 11 12
1877 -21.1 -26.5 -17.8 -12 -1.7 1.4 4.6 5.2 3 -2.8 -10 -19.2
1878 -22.9 -26.9 -19.6 -13 -5.6 1.9 3.2 4.3 -0.9 -4 -2 -3.7
1879 -13.5 -25.4 -21.3 -13 -2.7 0 5.2 5.8 -1.1 -5.5 -8.2 -16.5
1880 -22.6 -28.9 -22.7 -10.7 -3.1 2.2 5.6 2.9 0.8 -0.3 -9.4 -14.6
1881 -13.6 -19.9 -26.4 -12.8 -3.1 1.3 5.3 4.2 0.4 -2.8 -9.4 -17.5
1882 -27.6 -25.6 -24.9 -14.8 -3.2 3.1 3.3 4.1 1 -6.1 -12.1 -17.1
1883 -19.9 -21.5 -12.7 -12.8 -1.9 3 6.2 4.4 2.2 -4.1 -8.9 -21.1
1884 -26.1 -24.9 -22.7 -13.6 -6.4 1.6 5.1 3.2 -1.6 -5.1 -11.8 -20.7
1885 -15.9 -18.3 -23 -12.8 -2 -0.4 5.2 5.2 -0.8 -4.6 -13.2 -18.5
1886 -24.2 -25.2 -23.7 -18.4 -2.8 1.2 5.3 4.2 0.5 -6.3 -11.9 -18.6

... (97 rows omitted)

temps_by_month.select('Year', '2', '9').plot('Year')
../_images/11-pivots-and-joins_31_0.png

Group vs. Pivot

When is it an advantage to use pivot over group and vice versa?

Joins

Let’s hand pick a few of our favorite gifts.

best_gifts = gifts.take(0,1,2,3).drop("Year")
best_gifts
Country Value
Denmark 388
Egypt 630
Denmark 340
Finland 445

If you’re curious what they are, here are shortened descriptions for the four we chose.

best_gifts.with_columns("Gift description", 
                        make_array('Book "Restoring the Military Balance"',
                                   'Yellow alabaster bowl',
                                   'Photograph of Her Majesty and His Royal Highness',
                                   'Hand-blown blue glass bird'))
Country Value Gift description
Denmark 388 Book "Restoring the Military Balance"
Egypt 630 Yellow alabaster bowl
Denmark 340 Photograph of Her Majesty and His Royal Highness
Finland 445 Hand-blown blue glass bird

Here is some other info about countries, specifically the GDP from each country during Obama’s first year in office. (GDP is gross domestic product, a measure of the value of the final goods and services produced in a country.)

# Values taken from World Bank and IMF sites. 
# https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
# https://www.imf.org/en/Publications/WEO/weo-database/2022/April/download-entire-database
gdp = Table().with_columns(
    'Country Name', make_array('Denmark', 'Egypt', 'Egypt', 'Greece'),
    'GDP (Billion $)', make_array(321, 189, 198, 331),
    'Source', make_array('World Bank', 'World Bank', 'IMF', 'World Bank'))
gdp
Country Name GDP (Billion $) Source
Denmark 321 World Bank
Egypt 189 World Bank
Egypt 198 IMF
Greece 331 World Bank

Can we combine best_gifts with info about the countries like GDP?

Join will let us merge data from two tables by pairing together rows from each that share a common property. Here we’ll merge a table of gifts with a table of GDP information about the countres giving gifts.

# Values taken from World Bank and IMF sites. 
# https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
# https://www.imf.org/en/Publications/WEO/weo-database/2022/April/download-entire-database
gdp = Table().with_columns(
    'Country Name', make_array('Denmark', 'Egypt', 'Egypt', 'Greece'),
    'GDP (Billion $)', make_array(321, 189, 198, 331),
    'Source', make_array('World Bank', 'World Bank', 'IMF', 'World Bank'))
gdp
Country Name GDP (Billion $) Source
Denmark 321 World Bank
Egypt 189 World Bank
Egypt 198 IMF
Greece 331 World Bank
best_gifts.join('Country', gdp, 'Country Name')
Country Value GDP (Billion $) Source
Denmark 388 321 World Bank
Denmark 340 321 World Bank
Egypt 630 189 World Bank
Egypt 630 198 IMF

Open-ended exploration

Suppose we had the following question: Is there any relationship between a country’s GDP and the sum total of the amount of gifts they gave to President Obama?

The World Bank provides GDP data we can use.

gdp = Table().read_table('data/gdp.csv')
gdp.take(np.arange(10,20))
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Unnamed: 66
Armenia ARM GDP (current US$) NY.GDP.MKTP.CD nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 2.25684e+09 2.06987e+09 1.27284e+09 1.20131e+09 1.31516e+09 1.46832e+09 1.59697e+09 1.63949e+09 1.89373e+09 1.84548e+09 1.91156e+09 2.11847e+09 2.37634e+09 2.80706e+09 3.57662e+09 4.90047e+09 6.38445e+09 9.2063e+09 1.1662e+10 8.64794e+09 9.26028e+09 1.01421e+10 1.06193e+10 1.11215e+10 1.16095e+10 1.05533e+10 1.05461e+10 1.15275e+10 1.24579e+10 1.36193e+10 1.26412e+10 1.38612e+10 nan
American Samoa ASM GDP (current US$) NY.GDP.MKTP.CD nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 5.12e+08 5.24e+08 5.09e+08 5e+08 4.93e+08 5.18e+08 5.6e+08 6.75e+08 5.73e+08 5.7e+08 6.4e+08 6.38e+08 6.43e+08 6.73e+08 6.71e+08 6.12e+08 6.39e+08 6.48e+08 7.09e+08 nan nan
Antigua and Barbuda ATG GDP (current US$) NY.GDP.MKTP.CD nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 7.74968e+07 8.78793e+07 1.0908e+08 1.31431e+08 1.47842e+08 1.64369e+08 1.82144e+08 2.08373e+08 2.40924e+08 2.9044e+08 3.37175e+08 3.98638e+08 4.38795e+08 4.5947e+08 4.81707e+08 4.99281e+08 5.35174e+08 5.8943e+08 5.77281e+08 6.3373e+08 6.80619e+08 7.27859e+08 7.662e+08 8.2637e+08 8.00481e+08 8.14381e+08 8.56396e+08 9.1973e+08 1.02296e+09 1.15766e+09 1.31276e+09 1.37007e+09 1.22833e+09 1.1487e+09 1.13764e+09 1.19995e+09 1.18145e+09 1.24973e+09 1.33669e+09 1.43659e+09 1.46798e+09 1.60594e+09 1.68753e+09 1.37028e+09 1.47113e+09 nan
Australia AUS GDP (current US$) NY.GDP.MKTP.CD 1.86068e+10 1.96831e+10 1.99227e+10 2.15399e+10 2.38011e+10 2.59772e+10 2.73099e+10 3.04446e+10 3.2717e+10 3.66861e+10 4.13372e+10 4.52223e+10 5.20514e+10 6.3845e+10 8.89816e+10 9.73331e+10 1.05101e+11 1.10388e+11 1.18536e+11 1.34941e+11 1.50032e+11 1.76953e+11 1.94105e+11 1.77334e+11 1.93594e+11 1.80574e+11 1.82368e+11 1.894e+11 2.36066e+11 2.99768e+11 3.11327e+11 3.25903e+11 3.2548e+11 3.12126e+11 3.22807e+11 3.67916e+11 4.0109e+11 4.35324e+11 3.99404e+11 3.89099e+11 4.15576e+11 3.79084e+11 3.95343e+11 4.67391e+11 6.14166e+11 6.95075e+11 7.47556e+11 8.53955e+11 1.05513e+12 9.28043e+11 1.14759e+12 1.39791e+12 1.54651e+12 1.57634e+12 1.4675e+12 1.35053e+12 1.20669e+12 1.32688e+12 1.42853e+12 1.39195e+12 1.32784e+12 1.54266e+12 nan
Austria AUT GDP (current US$) NY.GDP.MKTP.CD 6.59269e+09 7.31175e+09 7.75611e+09 8.37418e+09 9.16998e+09 9.99407e+09 1.08877e+10 1.15794e+10 1.24406e+10 1.35828e+10 1.5373e+10 1.78585e+10 2.20596e+10 2.95155e+10 3.51893e+10 4.00592e+10 4.296e+10 5.15458e+10 6.20523e+10 7.39373e+10 8.20589e+10 7.10342e+10 7.12753e+10 7.2121e+10 6.79853e+10 6.93868e+10 9.90362e+10 1.24168e+11 1.33339e+11 1.33106e+11 1.66463e+11 1.73794e+11 1.95078e+11 1.9038e+11 2.03535e+11 2.41038e+11 2.37251e+11 2.1279e+11 2.1826e+11 2.17259e+11 1.9729e+11 1.97509e+11 2.14395e+11 2.62274e+11 3.01458e+11 3.16092e+11 3.3628e+11 3.89186e+11 4.32052e+11 4.01759e+11 3.92275e+11 4.31685e+11 4.09402e+11 4.30191e+11 4.42585e+11 3.81971e+11 3.95837e+11 4.17261e+11 4.55168e+11 4.45012e+11 4.33258e+11 4.77082e+11 nan
Azerbaijan AZE GDP (current US$) NY.GDP.MKTP.CD nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 8.85801e+09 8.79237e+09 4.46306e+08 1.57e+09 1.19331e+09 2.41736e+09 3.17633e+09 3.96224e+09 4.44637e+09 4.58143e+09 5.2728e+09 5.70772e+09 6.23586e+09 7.27675e+09 8.68037e+09 1.32457e+10 2.0983e+10 3.30503e+10 4.88525e+10 4.42915e+10 5.29093e+10 6.59516e+10 6.96839e+10 7.41644e+10 7.52443e+10 5.30744e+10 3.78675e+10 4.08656e+10 4.71129e+10 4.81742e+10 4.2693e+10 5.46222e+10 nan
Burundi BDI GDP (current US$) NY.GDP.MKTP.CD 1.96e+08 2.03e+08 2.135e+08 2.3275e+08 2.6075e+08 1.58995e+08 1.65445e+08 1.78297e+08 1.832e+08 1.90206e+08 2.42733e+08 2.52842e+08 2.46805e+08 3.0434e+08 3.45263e+08 4.20987e+08 4.48413e+08 5.47536e+08 6.10226e+08 7.82497e+08 9.19727e+08 9.69047e+08 1.01322e+09 1.08293e+09 9.87144e+08 1.14998e+09 1.20173e+09 1.13147e+09 1.0824e+09 1.11392e+09 1.1321e+09 1.1674e+09 1.08304e+09 9.38633e+08 9.25031e+08 1.00043e+09 8.69034e+08 9.72896e+08 8.93771e+08 8.08077e+08 8.70486e+08 8.76795e+08 8.25394e+08 7.84654e+08 9.15257e+08 1.11711e+09 1.27338e+09 1.3562e+09 1.61184e+09 1.78146e+09 2.03214e+09 2.23582e+09 2.33334e+09 2.45161e+09 2.70578e+09 3.104e+09 2.63932e+09 2.71232e+09 2.66012e+09 2.58127e+09 2.78051e+09 2.90203e+09 nan
Belgium BEL GDP (current US$) NY.GDP.MKTP.CD 1.16587e+10 1.24001e+10 1.3264e+10 1.426e+10 1.59601e+10 1.73715e+10 1.86519e+10 1.9992e+10 2.13764e+10 2.37107e+10 2.67062e+10 2.98217e+10 3.72094e+10 4.77438e+10 5.60331e+10 6.56782e+10 7.11139e+10 8.28399e+10 1.01247e+11 1.16315e+11 1.26829e+11 1.0473e+11 9.20959e+10 8.71842e+10 8.33495e+10 8.62683e+10 1.20019e+11 1.49394e+11 1.62299e+11 1.64221e+11 2.05332e+11 2.10511e+11 2.34782e+11 2.24722e+11 2.44884e+11 2.88026e+11 2.79201e+11 2.52708e+11 2.58528e+11 2.58246e+11 2.36792e+11 2.36746e+11 2.58384e+11 3.18083e+11 3.69215e+11 3.85715e+11 4.0826e+11 4.70922e+11 5.17328e+11 4.83254e+11 4.81421e+11 5.2333e+11 4.96153e+11 5.21791e+11 5.3539e+11 4.62336e+11 4.76063e+11 5.02765e+11 5.43347e+11 5.35376e+11 5.21677e+11 5.99879e+11 nan
Benin BEN GDP (current US$) NY.GDP.MKTP.CD 2.26196e+08 2.35668e+08 2.36435e+08 2.53928e+08 2.69819e+08 2.89909e+08 3.02925e+08 3.06222e+08 3.26323e+08 3.30748e+08 3.33628e+08 3.35073e+08 4.10332e+08 5.04376e+08 5.54655e+08 6.7687e+08 6.98408e+08 7.5005e+08 9.28843e+08 1.18623e+09 1.40525e+09 1.29112e+09 1.26778e+09 1.09535e+09 1.05113e+09 1.04571e+09 1.3361e+09 1.56241e+09 1.62025e+09 1.50229e+09 1.95997e+09 1.98644e+09 1.69532e+09 2.27456e+09 1.59808e+09 2.16963e+09 2.36112e+09 2.2683e+09 2.45509e+09 3.67739e+09 3.51999e+09 3.66622e+09 4.19434e+09 5.34926e+09 6.19027e+09 6.56765e+09 7.03411e+09 8.16905e+09 9.78774e+09 9.73863e+09 9.53534e+09 1.06933e+10 1.11414e+10 1.25178e+10 1.32845e+10 1.13882e+10 1.18211e+10 1.27017e+10 1.42624e+10 1.43917e+10 1.56515e+10 1.77856e+10 nan
Burkina Faso BFA GDP (current US$) NY.GDP.MKTP.CD 3.30443e+08 3.50247e+08 3.79567e+08 3.94041e+08 4.10322e+08 4.22917e+08 4.3389e+08 4.50754e+08 4.60443e+08 4.78299e+08 4.58404e+08 4.82411e+08 5.78596e+08 6.74774e+08 7.51133e+08 9.39973e+08 9.76547e+08 1.13122e+09 1.47558e+09 1.74848e+09 1.92872e+09 1.77584e+09 1.75445e+09 1.60028e+09 1.45988e+09 1.55249e+09 2.0363e+09 2.36983e+09 2.61604e+09 2.61559e+09 3.1013e+09 3.13505e+09 3.35669e+09 3.19954e+09 1.89529e+09 2.37952e+09 2.58655e+09 2.44767e+09 2.8049e+09 3.38957e+09 2.96837e+09 3.19037e+09 3.62235e+09 4.74077e+09 5.45169e+09 6.14635e+09 6.54742e+09 7.62572e+09 9.45144e+09 9.4507e+09 1.01096e+10 1.20803e+10 1.2561e+10 1.34443e+10 1.3943e+10 1.18322e+10 1.28334e+10 1.4107e+10 1.58901e+10 1.61782e+10 1.79336e+10 1.97376e+10 nan
gdp_2008 = Table().with_columns(
    'Country Name', gdp.column('Country Name'),
    '2008 GDP Billion USD', gdp.column('2008') / 1e9
)
gdp_2008.take(np.arange(10,20))
Country Name 2008 GDP Billion USD
Armenia 11.662
American Samoa 0.56
Antigua and Barbuda 1.37007
Australia 1055.13
Austria 432.052
Azerbaijan 48.8525
Burundi 1.61184
Belgium 517.328
Benin 9.78774
Burkina Faso 9.45144
gifts_by_country = all_gifts.drop('Year').group('Country', sum)
gifts_by_country
Country Value sum
Afghanistan 9263
Algeria 4312.28
Argentina 3977.98
Armenia 2985
Australia 4365.48
Austria 1275
Azerbaijan 11503
Bahrain 10494
Bangladesh 390
Bavaria 1928.71

... (113 rows omitted)

joined = gifts_by_country.join('Country', gdp_2008, 'Country Name')
joined
Country Value sum 2008 GDP Billion USD
Afghanistan 9263 10.1093
Algeria 4312.28 171.001
Argentina 3977.98 361.558
Armenia 2985 11.662
Australia 4365.48 1055.13
Austria 1275 432.052
Azerbaijan 11503 48.8525
Bahrain 10494 25.7109
Bangladesh 390 91.6313
Benin 12289.9 9.78774

... (91 rows omitted)

joined.scatter('2008 GDP Billion USD', 'Value sum')
../_images/11-pivots-and-joins_50_0.png

The scatter plot doesn’t show much because most of the data is in a very tiny part of the graph. It’s hard to tell whether the other points are outliers or part of some trend. In this case, one handy tool is to take the log of each x-value and y-value.

Recall that for a number \(n\), \(log(n) = a\) such \(10^a = n\). Here are some values to illustrate how logarithms work:

n

log(n)

1

0

10

1

100

2

1,000

3

10,000

4

100,000

5

1,000,000

6

10,000,000

7

100,000,000

8

1,000,000,000

9

joined.scatter('2008 GDP Billion USD', 'Value sum')
plots.xscale('log')
plots.yscale('log')
../_images/11-pivots-and-joins_52_0.png

That makes the points more evenly distributed across both axes of the plots, and it makes any correlation jump out. We can do a lot with log-log plots like this, but for us, we will be happy to just use it to make the correlation more apparent in this one example.

What about the relationship between oil production and gift values? You can find oil production for every country or region of the world here.

# Compute the average oil production during the years 2009-2016.
oil = Table().read_table('data/oil_production_by_country.csv')
oil_for_years = oil.where('Year', are.between_or_equal_to(2009, 2016))
oil_by_entity = oil_for_years.drop('Code', 'Year').group('Entity', np.average)
oil_by_entity.sort('Oil production (TWh) average', descending=True)
oil_by_entity.sample(10)
Entity Oil production (TWh) average
North Korea 0
Senegal 0
Turkey 30.1689
Fiji 0
Mozambique 0
United Kingdom 582.954
Liberia 0
Kuwait 1660.07
Guadeloupe 0
Thailand 193.364
joined_oil = gifts_by_country.join('Country', oil_by_entity, 'Entity')
joined_oil.scatter('Oil production (TWh) average', 'Value sum')
../_images/11-pivots-and-joins_56_0.png
joined_oil.scatter('Oil production (TWh) average', 'Value sum')
plots.xscale('log')
plots.yscale('log')
plots.xlim(1/1e3, 1e5)
plots.ylim(1e2, 1e6);
../_images/11-pivots-and-joins_57_0.png
#If you're curious about the top oil producers in the world...
joined_oil.sort('Oil production (TWh) average', descending=True)
Country Value sum Oil production (TWh) average
Russia 21474 6153.07
Saudi Arabia 2.14279e+06 6151.13
United States 974 5059.72
China 79786.5 2381.96
Canada 6541.36 2190.7
United Arab Emirates 6125 1825.73
Iraq 4755.99 1819.56
Kuwait 48265 1660.07
Mexico 32546.2 1612.68
Brazil 44409.7 1374.56

... (102 rows omitted)

Q: Brainstorm some alternative variables we might want to examine correlations between?

  • Norms of gift giving? Political alliances? Trade deals?

  • Could we track if this relationship changes over time?

  • In this class, we’re going to be moving towards some more of these open-ended data questions.