Columns and Rows

from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

1. Table Review

This data comes from the Getty Provenance Index, which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books.

https://upload.wikimedia.org/wikipedia/commons/d/d8/Sir_Anthony_van_Dyck_-_Portrait_of_Antoine_Triest%2C_Bishop_of_Ghent_%281576%E2%80%931655%29_-_BF.1977.2_-_Hermitage_Museum.jpg

Sir Anthony van Dyck - Portrait of Antoine Triest, Bishop of Ghent

art = Table.read_table('data/UK_art_sales.csv').drop('country_auth', 'sale_code', 'lot_sale_month', 'lot_sale_day', 'price_amount', 'shillings')
art.show(5)
lot_sale_year auction_house title artist_name nationality object_type pounds
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Dutch Painting 14
1839 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Dutch Painting 13
1837 Enoch & Redfern Portrait of a Gentleman, in black slashed dress, crimson ... [Anonymous] Unknown Painting 0
1837 Enoch & Redfern Portrait (3/4 length) of a Young Lady in blue silk dress ... [Anonymous] Unknown Painting 0
1837 Enoch & Redfern Portraits of Two Gentlemen and a Lady [Anonymous] Unknown Painting 0

... (1817 rows omitted)

Table method chaining practice

Given the art table about auction sales, what are the top 5 paintings ranked by price paid in pounds made by the artist “Vandyck”? Return a table.

Hint: pounds column is the price paid in pounds

#we can break up method chaining into intermediate variables to debug 
unsorted_vandyck = art.where("artist_name", are.containing('Vandyck')).where('object_type', 'Painting')
unsorted_vandyck
lot_sale_year auction_house title artist_name nationality object_type pounds
1836 Foster (Edward) Portrait of a young Lady with a basket of flowers P. Vandyck Dutch Painting 0
1805 Christie's A Head Vandyck Flemish Painting 0
1805 Phillips (Harry) A capital whole length Portrait of Elizabeth, Countess o ... Vandyck Flemish Painting 105
1836 Foster (Edward) A Portrait of the Countess of Derby Vandyck Flemish Painting 4
1839 Winstanley A fine Portrait of Charles the First [Fine originals] Vandyck Flemish Painting 82
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1804 Coxe (Peter) Portrait of Bishop Trieste Vandyck Flemish Painting 173
1804 Coxe (Peter) The Virgin in the Clouds with Angels Vandyck Flemish Painting 393
1804 Coxe (Peter) His Own Portrait Vandyck Flemish Painting 73
1805 Christie's St. John and the Virgin, after Rubens, G. Head; and a Po ... Vandyck Flemish Painting 0

... (2 rows omitted)

vandyck = unsorted_vandyck.sort('pounds', descending=True).take(np.arange(5))
vandyck
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1804 Coxe (Peter) The Virgin in the Clouds with Angels Vandyck Flemish Painting 393
1804 Coxe (Peter) Portrait of Bishop Trieste Vandyck Flemish Painting 173
1805 Phillips (Harry) A capital whole length Portrait of Elizabeth, Countess o ... Vandyck Flemish Painting 105
1839 Winstanley A fine Portrait of Charles the First [Fine originals] Vandyck Flemish Painting 82
#or method chaining in one line
vandyck = art.where("artist_name", are.containing('Vandyck')).where('object_type', 'Painting').sort('pounds', descending=True).take(np.arange(5))
vandyck 
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1804 Coxe (Peter) The Virgin in the Clouds with Angels Vandyck Flemish Painting 393
1804 Coxe (Peter) Portrait of Bishop Trieste Vandyck Flemish Painting 173
1805 Phillips (Harry) A capital whole length Portrait of Elizabeth, Countess o ... Vandyck Flemish Painting 105
1839 Winstanley A fine Portrait of Charles the First [Fine originals] Vandyck Flemish Painting 82

Array broadcasting practice

Given the table you just created, what are these prices paid in today’s US dollars?

Hints:

  • 1 pound in the 1800s is the equivalent to 96 pounds today

  • 1 pound today is the equivalent to 1.15 US dollars

price_vandyck = vandyck.column('pounds')
price_vandyck
array([1575,  393,  173,  105,   82])
price_today_usd_vandyck = price_vandyck*96*1.15
price_today_usd_vandyck
array([173880. ,  43387.2,  19099.2,  11592. ,   9052.8])

Takeaway: Array broadcasting saves you work! You do not have to apply the same conversion over and over and over.

Visualization practice

Make a bar chart of the the artist name and price in pounds of the top 15 objects with the earliest lot_sale_year.

art.sort('lot_sale_year', descending=False).take(np.arange(0, 15)).barh('artist_name', 'pounds')
../_images/05-columns-and-rows_14_0.png

Make a bar chart with the auction house and price in pounds of top 10 paintings with the latest lot_sale_year.

art.sort('lot_sale_year', descending=True).take(np.arange(0, 10)).where('object_type', are.equal_to('Painting')).barh('auction_house', 'pounds')
../_images/05-columns-and-rows_16_0.png

Where method practice

Sometimes when working with data we only have partial information about an item. For example, in this dataset, titles of works of art and artist names were sometimes recorded different in the 1800s compared to their official titles and artist names today in the 21st century.

For example, suppose I want the painting we showed above, “Portrait of Antoine Triest.”

# try searching with that exact string
art.where('title', are.containing('Portrait of Antoine Triest, Bishop of Ghent')) # should see nothing returned
lot_sale_year auction_house title artist_name nationality object_type pounds
# try just a substring of the title 
art.where('title', are.containing('Triest')) #returns any row that contains within it 'Triest' 
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) Portrait of Bishop Trieste Vandyck Flemish Painting 173
# Let's answer the question: 
# How much did someone pay (in today’s US dollars) for the "Portrait of Antoine Triest, Bishop of Ghent" at an auction house in 1804? 
pounds_1800s = art.where('title', are.containing('Triest')).column('pounds').item(0)
pounds_1800s
173
usd_2020s = pounds_1800s * 96 * 1.15
usd_2020s
19099.199999999997
# This now sits in the Museum of Fine Arts, Houston... probably would sell for a bit more than $19K USD

We can search for rows that match many values in a column with are.contained_in()

#artist names are messy, many possible ways 1800s auctioneers recorded the artists names 
possible_artist_names = make_array('Vandyck', 'Dyck', 'V. Dyck', 'Van Dyck')
art.where('artist_name', are.contained_in(possible_artist_names)).show()
lot_sale_year auction_house title artist_name nationality object_type pounds
1838 Christie's A dance of cupids; admirably coloured V. Dyck Flemish Painting 12
1805 Christie's A Head Vandyck Flemish Painting 0
1805 Phillips (Harry) A capital whole length Portrait of Elizabeth, Countess o ... Vandyck Flemish Painting 105
1837 Christie's A small portrait V. Dyck Flemish Painting 0
1836 Foster (Edward) A Portrait of the Countess of Derby Vandyck Flemish Painting 4
1839 Winstanley A fine Portrait of Charles the First [Fine originals] Vandyck Flemish Painting 82
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1804 Coxe (Peter) Portrait of Bishop Trieste Vandyck Flemish Painting 173
1804 Coxe (Peter) The Virgin in the Clouds with Angels Vandyck Flemish Painting 393
1804 Coxe (Peter) His Own Portrait Vandyck Flemish Painting 73
1840 Christie's The Magdalen supporting the head of the dead Christ Van Dyck Flemish Painting 0
1836 Christie's Sampson and Dalilah; a splendid sketch, full of spirit. ... Van Dyck Flemish Painting 16
1805 Christie's St. John and the Virgin, after Rubens, G. Head; and a Po ... Vandyck Flemish Painting 0
1848 Christie's Van Dyck as Icarus, Daedalus tying on his Wings: -- thi ... Van Dyck Flemish Painting 66
1839 Christie's Charles I. on horseback, with the Marquis of Hamilton; a ... Van Dyck Flemish Painting 37
1801 Christie's A Girl's Head, and a Head of Charles the Ist Vandyck Flemish Painting 0
1805 Abbott (William) A Man in Armour Vandyck Flemish Painting 0
1845 Christie's The Portrait of Paul Pontius, the celebrated engraver, s ... Van Dyck Flemish Painting 16
1837 Christie's A fragment of a picture; well coloured Van Dyck Flemish Painting 0
1840 Christie's Portrait of Charles Lewis, Elector Palatine, in armour, ... V. Dyck Flemish Painting 24

2. Numpy methods

We will use numpy methods throughout this course to help us understand trends in data.

# In this class, we will always import numpy the same way 
import numpy as np
#Q: how much did auction bids increase by each time someone new made a bid? 
auction_bids_inorder = make_array(2, 4, 25, 50, 75)
np.diff(auction_bids_inorder)
array([ 2, 21, 25, 25])

Q: Consider two major auction houses: Christie’s and Sotheby’s. For each auction house, what is the average price (in pounds) paid for the art they auctioned off?

auction_house = "Christie's"
price = art.where('auction_house', auction_house).column('pounds')
print(auction_house, 'mean price', np.mean(price))
Christie's mean price 37.79392824287029
# When we create variables we can save ourselves work 
auction_house = "Sotheby's" #this is the only link we change
price = art.where('auction_house', auction_house).column('pounds')
print(auction_house, 'mean price', np.mean(price))
Sotheby's mean price 0.4375

We always want to think about what results we calculate mean.

At the time, which auction house was selling more expensive items? A: Christie’s

3. Creating a Table from Scratch

Premise: Suppose you find some really interesting facts online, for example, the list of the most expensive paintings ever sold on Wikipedia. Sometimes you may want to manually take the data you’re viewing and put it into your Python code.

Let’s make a table from scratch (rather than a .csv file)!

Create a table from an array and the .with_columns() method.

titles = make_array('Salvator Mundi', 'Interchange', 'The Card Players')
expensive_paintings = Table().with_columns('Titles', titles)
expensive_paintings
Titles
Salvator Mundi
Interchange
The Card Players

You can extend existing Tables with new arrays.

adjusted_price = make_array(497.8,343,301)
expensive_paintings = expensive_paintings.with_columns('Adjusted Price (Millions USD)',adjusted_price)
expensive_paintings
Titles Adjusted Price (Millions USD)
Salvator Mundi 497.8
Interchange 343
The Card Players 301

We can also create Tables with multiple arrays at the same time.

new_expensive_paintings = Table().with_columns('Titles', titles, 
                                 'Adjusted Price (Millions USD)',adjusted_price)
new_expensive_paintings
Titles Adjusted Price (Millions USD)
Salvator Mundi 497.8
Interchange 343
The Card Players 301

Left off in class 2022-09-19. Explore the questions below on your own for more practice.

Table info

expensive_paintings.labels
('Titles', 'Adjusted Price (Millions USD)')
expensive_paintings.num_rows
3
expensive_paintings.num_columns
2
#try our big table
art.labels
('lot_sale_year',
 'auction_house',
 'title',
 'artist_name',
 'nationality',
 'object_type',
 'pounds')
art.num_rows
1822
art.num_columns
7

Relabeling columns

expensive_paintings.relabeled('Adjusted Price (Millions USD)', 'Price USD Millions')
Titles Price USD Millions
Salvator Mundi 497.8
Interchange 343
The Card Players 301
expensive_paintings # Remember if we want the results of a method to persist we have to reassign the variable!
Titles Adjusted Price (Millions USD)
Salvator Mundi 497.8
Interchange 343
The Card Players 301
expensive_paintings = expensive_paintings.relabeled('Adjusted Price (Millions USD)', 'Price USD Millions')
expensive_paintings
Titles Price USD Millions
Salvator Mundi 497.8
Interchange 343
The Card Players 301

4. More array and column practice

art.show(2)
lot_sale_year auction_house title artist_name nationality object_type pounds
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Dutch Painting 14
1839 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Dutch Painting 13

... (1820 rows omitted)

# Access the column by the column label 
art.column('lot_sale_year')
array([1839, 1839, 1837, ..., 1785, 1836, 1803])
# Access the column by the *index* of the column
art.column(0)
array([1839, 1839, 1837, ..., 1785, 1836, 1803])
# Show: What is the latest sale year in the dataset
latest = art.sort("lot_sale_year", descending=True).column("lot_sale_year").item(0)
latest
1859
years_since_1800 = art.column("lot_sale_year") - 1800
years_since_1800
array([ 39,  39,  37, ..., -15,  36,   3])
# Q: What is the price of the most expensive work of art? Return a single *int* variable 
most_expensive = art.sort("pounds", descending=True).column("pounds").item(0)
most_expensive
1575
# Q: What is the mean price as the most expensive work of art
mean_price = np.mean(art.column('pounds'))
mean_price
32.36278814489572
# Q: Create an array that is each work of art's price (in pounds) as a proportion of the mean price. 
proportion_of_mean = art.column("pounds") / mean_price
proportion_of_mean
array([0.43259561, 0.40169592, 0.        , ..., 0.92699059, 0.        ,
       1.23598745])

Extend a table with a new column

art= art.with_columns('Price proportion of mean', proportion_of_mean)
art.show(3)
lot_sale_year auction_house title artist_name nationality object_type pounds Price proportion of mean
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Dutch Painting 14 0.432596
1839 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Dutch Painting 13 0.401696
1837 Enoch & Redfern Portrait of a Gentleman, in black slashed dress, crimson ... [Anonymous] Unknown Painting 0 0

... (1819 rows omitted)

5. Other quantitative questions we can ask about this dataset?

© Copyright 2022.