Data Types
Contents
Data Types#
from datascience import *
from cs104 import *
import numpy as np
%matplotlib inline
1. Table Review: Art sales in the UK#
This data comes from the Getty Provenance Index, which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books.

Sir Anthony van Dyck - Portrait of Antoine Triest
Recall: you can open the raw .csv
files within Jupyter’s file system. From inside Jupyter, locate the CSV file from the File Browser on the left-hand side of the window. Double click to view as a formatted table, and right-click and select “Open With -> Editor” to view as an editable text file.
art = Table().read_table('data/UK_art_subset.csv')
art.show(5)
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Painting | 14 |
1839 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Painting | 13 |
1837 | Enoch & Redfern | A Mythological Subject, representing Apollo playing on a ... | Rubens | Painting | 1 |
1803 | Edwards (Edward) | This picture represents an Alchemist with other Figures; ... | Teniers | Painting | 18 |
1838 | Christie's | A dance of cupids; admirably coloured | V. Dyck | Painting | 12 |
... (1098 rows omitted)
Review: Remove the unhelpful columns (e.g. 'auction_house'
) with the drop
method, and save the result table in no_house
variable.
no_house = art.drop('auction_house')
no_house.show(5)
lot_sale_year | title | artist_name | object_type | pounds |
---|---|---|---|---|
1839 | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Painting | 14 |
1839 | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Painting | 13 |
1837 | A Mythological Subject, representing Apollo playing on a ... | Rubens | Painting | 1 |
1803 | This picture represents an Alchemist with other Figures; ... | Teniers | Painting | 18 |
1838 | A dance of cupids; admirably coloured | V. Dyck | Painting | 12 |
... (1098 rows omitted)
art.sort('artist_name', descending=True).show(4)
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1839 | Rainy (Alexander) | The late Admiral Lord Exmouth, going into Action at Algi ... | [Sir William Beechey] | Painting | 37 |
1839 | Christie's | The Infant placing a wreath of flowers on the head of th ... | Zurbaran | Painting | 39 |
1805 | Coxe (Peter) | Ditto [Landscape and Figures], the Companion | Zuccarelli | Painting | 14 |
1836 | Christie's | A landscape, with a waterfall and figures | Zuccarelli | Painting | 17 |
... (1099 rows omitted)
Find non-painting objects using Table.where(...)
and the predicate are.not_equal_to()
.
not_a_painting = art.where('object_type', are.not_equal_to('Painting'))
not_a_painting
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1859 | Phillips (Harry) | A set of three finely modelled bronzes of the Venus de M ... | G. Zoffoli | Sculpture | 16 |
1859 | Phillips (Harry) | Another [finely modelled Old Florentine Bronze] of A Fem ... | Florentine | Sculpture | 22 |
1859 | Phillips (Harry) | Milton dictating his Paradise Lost. | Nash | Drawing | 51 |
1836 | Foster (Edward) | Portrait of himself, in chalks, glazed, capital. | La Tour | Drawing | 56 |
Now, a quick recap of where
and the predicate tests to select rows.
art.where("auction_house", are.equal_to("Christie's"))
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Painting | 14 |
1839 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Painting | 13 |
1838 | Christie's | A dance of cupids; admirably coloured | V. Dyck | Painting | 12 |
1838 | Christie's | Susannah & the elders | Dietrich | Painting | 13 |
1838 | Christie's | A harbour-scene, with a royal yacht, vessels, boats, and ... | Backhuyzen | Painting | 35 |
1838 | Christie's | Interior of a Flemish cathedral, with figures by Francks | P. Neefs | Painting | 25 |
1838 | Christie's | A river-scene, with buildings and figures; a brilliant s ... | V. der Neer | Painting | 23 |
1838 | Christie's | A Dutch river near a town, with vessels and figures -- b ... | V. der Neer | Painting | 53 |
1838 | Christie's | A mountainous landscape, with classical figures; circular | G. Poussin | Painting | 13 |
1838 | Christie's | A cavalier seated smoking, in conversation with a female ... | Mieris | Painting | 28 |
... (776 rows omitted)
art.where("pounds", are.above(900))
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Painting | 1575 |
1859 | Phillips (Harry) | A Landscape, with full length Portraits of Pierre Both, ... | Albert Cuyp | Painting | 966 |
1840 | Christie's | The Holy Family; a composition of four figures, as large ... | Rubens | Painting | 945 |
art.where("lot_sale_year", are.between(1815, 1835))
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1823 | Christie's | The Four Seasons, exemplified in Four beautiful small Ca ... | David Teniers | Painting | 189 |
1819 | Christie's | A pair Historical, Edward and Eleonora, and companion, v ... | A. Kauffman | Painting | 48 |
art.where("title", are.containing('river'))
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Painting | 14 |
1838 | Christie's | A river-scene, with buildings and figures; a brilliant s ... | V. der Neer | Painting | 23 |
1838 | Christie's | A Dutch river near a town, with vessels and figures -- b ... | V. der Neer | Painting | 53 |
1838 | Christie's | A Grand Italian Landscape, with buildings near a river; ... | Berghem | Painting | 164 |
1840 | Christie's | The manege; numerous figures near a stable, and distant ... | J. Ostade | Painting | 26 |
1849 | Christie's | A grand landscape, with a river, with a cavalier and gyp ... | Van Uden | Painting | 18 |
1836 | Christie's | A brown horse standing near a river, in a landscape, wit ... | Cuyp | Painting | 28 |
1838 | Christie's | A Flemish town on fire on the bank of the river, moonlight | V. der Neer | Painting | 45 |
1838 | Christie's | A windmill and cottages on the bank of a river, with boa ... | Ostade | Painting | 22 |
1836 | Christie's | Landscape, with a river; on the perforated rocky bank of ... | Claude | Painting | 98 |
... (21 rows omitted)
What else can we learn from these data sets?#
Think-pair-share: Display the most expensive items sold after 1850?
Recall: method chaining let’s us combine multiple steps into a single line
art.where('lot_sale_year', are.above(1850)).sort('pounds', descending=True).show(5)
lot_sale_year | auction_house | title | artist_name | object_type | pounds |
---|---|---|---|---|---|
1859 | Phillips (Harry) | A Landscape, with full length Portraits of Pierre Both, ... | Albert Cuyp | Painting | 966 |
1859 | Phillips (Harry) | The Disgrace of Clarendon. | E.M. Ward, R.A. | Painting | 845 |
1859 | Phillips (Harry) | View of Windsor Castle. The celebrated picture pain ... | Patrick Nasmyth | Painting | 588 |
1859 | Phillips (Harry) | The Wood. Nymph chanting her Hymn to the Rising Sun. | J. Danby, R.A. | Painting | 378 |
1859 | Phillips (Harry) | L'Umana Fragilita. A strange but wonderfully imagina ... | Salvator Rosa | Painting | 346 |
... (39 rows omitted)
How much is £966 in 1859 in today’s USD?
The pound had an average inflation of 3.15% per year, meaning it is around £155,000 today.
pounds_2024 = 155000
dollars_2024 = pounds_2024 * 1.3
dollars_2024
201500.0
2. Data Types#
Type#
Can ask for the type of a value or variable with the built-in Python function type
type(3)
int
temperature = 98.6
type(temperature)
float
prof_name = "Steve"
type(prof_name)
str
this_class_is_fun = True
type(this_class_is_fun)
bool
Floats#
Some decisions made from Python. What type of value is produced by multiplying a float by an int?
answer = 0.75 * 2
answer
type(answer)
float
A computer cannot represent every real number exactly. That would require infinite memory because some numbers have an infinite number of digits.
1 / 3
0.3333333333333333
What happens when we run the next cell?
# 2 / 0
Scientific Notation#
Represent some numbers as \(b \times 10^e\).
Examples:
1.23e5
is \(1.23 \times 10^5\).6.667e-07
is \(6.67 \times 10^{-7}\).
2 / 3000
0.0006666666666666666
2 / 3000000
6.666666666666667e-07
0.000000000000000123456789
1.23456789e-16
0.000000000000000000000000000000000000000000000000000000000000000000000123456789
1.23456789e-70
Rounding Errors#
Since numbers aren’t always represented exactly, small errors may creap when we operated on floats. Too small for us to worry about in this class.
0.6666666666666666 - 0.6666666666666666123456789 # a little less than 0
0.0
2 ** 0.5
1.4142135623730951
2 ** 0.5 * 2 ** 0.5 # should be 2.0
2.0000000000000004
2 ** 0.5 * 2 ** 0.5 - 2 # should be 0
4.440892098500626e-16
Strings#
String values capture text data (sequences of characters). Use single quotes or double quotes around strings.
'Painting'
'Painting'
"Painting"
'Painting'
Variables vs Strings#
print("painting") # String value
painting = 4 # variable named painting
print(painting)
painting
4
Why both single and double quotes?
'Don't always use single quotes'
Cell In[31], line 1
'Don't always use single quotes'
^
SyntaxError: invalid syntax
"Don't always use single quotes"
"Don't always use single quotes"
'cs' + '104' # concatenation
'cs104'
'cs' + ' ' + '104' # spaces aren't added for you
'cs 104'
Conversions#
Can only concatenate multiple strings.
number = 104
'cs' + number
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[35], line 2
1 number = 104
----> 2 'cs' + number
TypeError: can only concatenate str (not "int") to str
Convert numbers to strings when you want to use them to build larger strings.
'cs' + str(number)
'cs104'
Can convert from string back to numbers as well.
int('3')
3
float('3.0')
3.0
int(str(number)) * 2
208
3. Arrays#
Array: sequence of values, all the same type, “boxed up”
Table operation: column
not_a_painting.column('pounds')
array([16, 22, 51, 56])
Arithmetic operations are broadcast on arrays.
What’s the price in dollars for each of these items?
price_in_pounds = not_a_painting.column('pounds')
price_in_dollars = price_in_pounds * 1.3
price_in_dollars
array([20.8, 28.6, 66.3, 72.8])
Suppose the art auction house adds 5 pounds to each item’s price.
price_in_pounds + 5
array([21, 27, 56, 61])
fives = make_array(5,10,15,20)
fives
array([ 5, 10, 15, 20])
price_in_pounds + fives
array([21, 32, 66, 76])
We can call other built-in Python functions on these arrays as well.
len(price_in_pounds)
4
max(price_in_pounds)
56
min(price_in_pounds)
16
sum(price_in_pounds)
145
np.mean(price_in_pounds)
36.25
price_in_pounds + make_array(1,2) #Error because not the same shapes
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[50], line 1
----> 1 price_in_pounds + make_array(1,2) #Error because not the same shapes
ValueError: operands could not be broadcast together with shapes (4,) (2,)
price_in_pounds + make_array(1,2, 3, 4)
array([17, 24, 54, 60])
Index into array to retrieve items. Indices start at 0.
price_in_pounds.item(0)
16
price_in_pounds.item(1)
22
price_in_pounds.item(3)
56
Think of item(n)
as asking for the item that has n
items before it.
The price of the most expensive piece of art sold:
top_price = art.sort('pounds', descending=True).column('pounds').item(0)
top_price
1575