Columns and Rows¶
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
1. Table Review¶
This data comes from the Getty Provenance Index, which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books.

Sir Anthony van Dyck - Portrait of Antoine Triest, Bishop of Ghent
art = Table.read_table('data/UK_art_sales.csv').drop('country_auth', 'sale_code', 'lot_sale_month', 'lot_sale_day', 'price_amount', 'shillings')
art.show(5)
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Dutch | Painting | 14 |
1839 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Dutch | Painting | 13 |
1837 | Enoch & Redfern | Portrait of a Gentleman, in black slashed dress, crimson ... | [Anonymous] | Unknown | Painting | 0 |
1837 | Enoch & Redfern | Portrait (3/4 length) of a Young Lady in blue silk dress ... | [Anonymous] | Unknown | Painting | 0 |
1837 | Enoch & Redfern | Portraits of Two Gentlemen and a Lady | [Anonymous] | Unknown | Painting | 0 |
... (1817 rows omitted)
Table method chaining practice¶
Given the art
table about auction sales, what are the top 5 paintings ranked by price paid in pounds made by the artist “Vandyck”? Return a table.
Hint: pounds
column is the price paid in pounds
#we can break up method chaining into intermediate variables to debug
unsorted_vandyck = art.where("artist_name", are.containing('Vandyck')).where('object_type', 'Painting')
unsorted_vandyck
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1836 | Foster (Edward) | Portrait of a young Lady with a basket of flowers | P. Vandyck | Dutch | Painting | 0 |
1805 | Christie's | A Head | Vandyck | Flemish | Painting | 0 |
1805 | Phillips (Harry) | A capital whole length Portrait of Elizabeth, Countess o ... | Vandyck | Flemish | Painting | 105 |
1836 | Foster (Edward) | A Portrait of the Countess of Derby | Vandyck | Flemish | Painting | 4 |
1839 | Winstanley | A fine Portrait of Charles the First [Fine originals] | Vandyck | Flemish | Painting | 82 |
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1804 | Coxe (Peter) | Portrait of Bishop Trieste | Vandyck | Flemish | Painting | 173 |
1804 | Coxe (Peter) | The Virgin in the Clouds with Angels | Vandyck | Flemish | Painting | 393 |
1804 | Coxe (Peter) | His Own Portrait | Vandyck | Flemish | Painting | 73 |
1805 | Christie's | St. John and the Virgin, after Rubens, G. Head; and a Po ... | Vandyck | Flemish | Painting | 0 |
... (2 rows omitted)
vandyck = unsorted_vandyck.sort('pounds', descending=True).take(np.arange(5))
vandyck
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1804 | Coxe (Peter) | The Virgin in the Clouds with Angels | Vandyck | Flemish | Painting | 393 |
1804 | Coxe (Peter) | Portrait of Bishop Trieste | Vandyck | Flemish | Painting | 173 |
1805 | Phillips (Harry) | A capital whole length Portrait of Elizabeth, Countess o ... | Vandyck | Flemish | Painting | 105 |
1839 | Winstanley | A fine Portrait of Charles the First [Fine originals] | Vandyck | Flemish | Painting | 82 |
#or method chaining in one line
vandyck = art.where("artist_name", are.containing('Vandyck')).where('object_type', 'Painting').sort('pounds', descending=True).take(np.arange(5))
vandyck
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1804 | Coxe (Peter) | The Virgin in the Clouds with Angels | Vandyck | Flemish | Painting | 393 |
1804 | Coxe (Peter) | Portrait of Bishop Trieste | Vandyck | Flemish | Painting | 173 |
1805 | Phillips (Harry) | A capital whole length Portrait of Elizabeth, Countess o ... | Vandyck | Flemish | Painting | 105 |
1839 | Winstanley | A fine Portrait of Charles the First [Fine originals] | Vandyck | Flemish | Painting | 82 |
Array broadcasting practice¶
Given the table you just created, what are these prices paid in today’s US dollars?
Hints:
1 pound in the 1800s is the equivalent to 96 pounds today
1 pound today is the equivalent to 1.15 US dollars
price_vandyck = vandyck.column('pounds')
price_vandyck
array([1575, 393, 173, 105, 82])
price_today_usd_vandyck = price_vandyck*96*1.15
price_today_usd_vandyck
array([173880. , 43387.2, 19099.2, 11592. , 9052.8])
Takeaway: Array broadcasting saves you work! You do not have to apply the same conversion over and over and over.
Visualization practice¶
Make a bar chart of the the artist name and price in pounds of the top 15 objects with the earliest lot_sale_year.
art.sort('lot_sale_year', descending=False).take(np.arange(0, 15)).barh('artist_name', 'pounds')

Make a bar chart with the auction house and price in pounds of top 10 paintings with the latest lot_sale_year
.
art.sort('lot_sale_year', descending=True).take(np.arange(0, 10)).where('object_type', are.equal_to('Painting')).barh('auction_house', 'pounds')

Where method practice¶
Sometimes when working with data we only have partial information about an item. For example, in this dataset, titles of works of art and artist names were sometimes recorded different in the 1800s compared to their official titles and artist names today in the 21st century.
For example, suppose I want the painting we showed above, “Portrait of Antoine Triest.”
# try searching with that exact string
art.where('title', are.containing('Portrait of Antoine Triest, Bishop of Ghent')) # should see nothing returned
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|
# try just a substring of the title
art.where('title', are.containing('Triest')) #returns any row that contains within it 'Triest'
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | Portrait of Bishop Trieste | Vandyck | Flemish | Painting | 173 |
# Let's answer the question:
# How much did someone pay (in today’s US dollars) for the "Portrait of Antoine Triest, Bishop of Ghent" at an auction house in 1804?
pounds_1800s = art.where('title', are.containing('Triest')).column('pounds').item(0)
pounds_1800s
173
usd_2020s = pounds_1800s * 96 * 1.15
usd_2020s
19099.199999999997
# This now sits in the Museum of Fine Arts, Houston... probably would sell for a bit more than $19K USD
We can search for rows that match many values in a column with are.contained_in()
#artist names are messy, many possible ways 1800s auctioneers recorded the artists names
possible_artist_names = make_array('Vandyck', 'Dyck', 'V. Dyck', 'Van Dyck')
art.where('artist_name', are.contained_in(possible_artist_names)).show()
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1838 | Christie's | A dance of cupids; admirably coloured | V. Dyck | Flemish | Painting | 12 |
1805 | Christie's | A Head | Vandyck | Flemish | Painting | 0 |
1805 | Phillips (Harry) | A capital whole length Portrait of Elizabeth, Countess o ... | Vandyck | Flemish | Painting | 105 |
1837 | Christie's | A small portrait | V. Dyck | Flemish | Painting | 0 |
1836 | Foster (Edward) | A Portrait of the Countess of Derby | Vandyck | Flemish | Painting | 4 |
1839 | Winstanley | A fine Portrait of Charles the First [Fine originals] | Vandyck | Flemish | Painting | 82 |
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1804 | Coxe (Peter) | Portrait of Bishop Trieste | Vandyck | Flemish | Painting | 173 |
1804 | Coxe (Peter) | The Virgin in the Clouds with Angels | Vandyck | Flemish | Painting | 393 |
1804 | Coxe (Peter) | His Own Portrait | Vandyck | Flemish | Painting | 73 |
1840 | Christie's | The Magdalen supporting the head of the dead Christ | Van Dyck | Flemish | Painting | 0 |
1836 | Christie's | Sampson and Dalilah; a splendid sketch, full of spirit. ... | Van Dyck | Flemish | Painting | 16 |
1805 | Christie's | St. John and the Virgin, after Rubens, G. Head; and a Po ... | Vandyck | Flemish | Painting | 0 |
1848 | Christie's | Van Dyck as Icarus, Daedalus tying on his Wings: -- thi ... | Van Dyck | Flemish | Painting | 66 |
1839 | Christie's | Charles I. on horseback, with the Marquis of Hamilton; a ... | Van Dyck | Flemish | Painting | 37 |
1801 | Christie's | A Girl's Head, and a Head of Charles the Ist | Vandyck | Flemish | Painting | 0 |
1805 | Abbott (William) | A Man in Armour | Vandyck | Flemish | Painting | 0 |
1845 | Christie's | The Portrait of Paul Pontius, the celebrated engraver, s ... | Van Dyck | Flemish | Painting | 16 |
1837 | Christie's | A fragment of a picture; well coloured | Van Dyck | Flemish | Painting | 0 |
1840 | Christie's | Portrait of Charles Lewis, Elector Palatine, in armour, ... | V. Dyck | Flemish | Painting | 24 |
2. Numpy methods¶
We will use numpy methods throughout this course to help us understand trends in data.
# In this class, we will always import numpy the same way
import numpy as np
#Q: how much did auction bids increase by each time someone new made a bid?
auction_bids_inorder = make_array(2, 4, 25, 50, 75)
np.diff(auction_bids_inorder)
array([ 2, 21, 25, 25])
Q: Consider two major auction houses: Christie’s and Sotheby’s. For each auction house, what is the average price (in pounds) paid for the art they auctioned off?
auction_house = "Christie's"
price = art.where('auction_house', auction_house).column('pounds')
print(auction_house, 'mean price', np.mean(price))
Christie's mean price 37.79392824287029
# When we create variables we can save ourselves work
auction_house = "Sotheby's" #this is the only link we change
price = art.where('auction_house', auction_house).column('pounds')
print(auction_house, 'mean price', np.mean(price))
Sotheby's mean price 0.4375
We always want to think about what results we calculate mean.
At the time, which auction house was selling more expensive items? A: Christie’s
3. Creating a Table from Scratch¶
Premise: Suppose you find some really interesting facts online, for example, the list of the most expensive paintings ever sold on Wikipedia. Sometimes you may want to manually take the data you’re viewing and put it into your Python code.
Let’s make a table from scratch (rather than a .csv
file)!
Create a table from an array and the .with_columns()
method.
titles = make_array('Salvator Mundi', 'Interchange', 'The Card Players')
expensive_paintings = Table().with_columns('Titles', titles)
expensive_paintings
Titles |
---|
Salvator Mundi |
Interchange |
The Card Players |
You can extend existing Tables with new arrays.
adjusted_price = make_array(497.8,343,301)
expensive_paintings = expensive_paintings.with_columns('Adjusted Price (Millions USD)',adjusted_price)
expensive_paintings
Titles | Adjusted Price (Millions USD) |
---|---|
Salvator Mundi | 497.8 |
Interchange | 343 |
The Card Players | 301 |
We can also create Tables with multiple arrays at the same time.
new_expensive_paintings = Table().with_columns('Titles', titles,
'Adjusted Price (Millions USD)',adjusted_price)
new_expensive_paintings
Titles | Adjusted Price (Millions USD) |
---|---|
Salvator Mundi | 497.8 |
Interchange | 343 |
The Card Players | 301 |
Left off in class 2022-09-19. Explore the questions below on your own for more practice.
Table info¶
expensive_paintings.labels
('Titles', 'Adjusted Price (Millions USD)')
expensive_paintings.num_rows
3
expensive_paintings.num_columns
2
#try our big table
art.labels
('lot_sale_year',
'auction_house',
'title',
'artist_name',
'nationality',
'object_type',
'pounds')
art.num_rows
1822
art.num_columns
7
Relabeling columns¶
expensive_paintings.relabeled('Adjusted Price (Millions USD)', 'Price USD Millions')
Titles | Price USD Millions |
---|---|
Salvator Mundi | 497.8 |
Interchange | 343 |
The Card Players | 301 |
expensive_paintings # Remember if we want the results of a method to persist we have to reassign the variable!
Titles | Adjusted Price (Millions USD) |
---|---|
Salvator Mundi | 497.8 |
Interchange | 343 |
The Card Players | 301 |
expensive_paintings = expensive_paintings.relabeled('Adjusted Price (Millions USD)', 'Price USD Millions')
expensive_paintings
Titles | Price USD Millions |
---|---|
Salvator Mundi | 497.8 |
Interchange | 343 |
The Card Players | 301 |
4. More array and column practice¶
art.show(2)
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Dutch | Painting | 14 |
1839 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Dutch | Painting | 13 |
... (1820 rows omitted)
# Access the column by the column label
art.column('lot_sale_year')
array([1839, 1839, 1837, ..., 1785, 1836, 1803])
# Access the column by the *index* of the column
art.column(0)
array([1839, 1839, 1837, ..., 1785, 1836, 1803])
# Show: What is the latest sale year in the dataset
latest = art.sort("lot_sale_year", descending=True).column("lot_sale_year").item(0)
latest
1859
years_since_1800 = art.column("lot_sale_year") - 1800
years_since_1800
array([ 39, 39, 37, ..., -15, 36, 3])
# Q: What is the price of the most expensive work of art? Return a single *int* variable
most_expensive = art.sort("pounds", descending=True).column("pounds").item(0)
most_expensive
1575
# Q: What is the mean price as the most expensive work of art
mean_price = np.mean(art.column('pounds'))
mean_price
32.36278814489572
# Q: Create an array that is each work of art's price (in pounds) as a proportion of the mean price.
proportion_of_mean = art.column("pounds") / mean_price
proportion_of_mean
array([0.43259561, 0.40169592, 0. , ..., 0.92699059, 0. ,
1.23598745])
Extend a table with a new column¶
art= art.with_columns('Price proportion of mean', proportion_of_mean)
art.show(3)
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds | Price proportion of mean |
---|---|---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Dutch | Painting | 14 | 0.432596 |
1839 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Dutch | Painting | 13 | 0.401696 |
1837 | Enoch & Redfern | Portrait of a Gentleman, in black slashed dress, crimson ... | [Anonymous] | Unknown | Painting | 0 | 0 |
... (1819 rows omitted)