Data Types¶
# Some code to set up our notebook for data science!
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=np.VisibleDeprecationWarning)
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
1. Table Review: Dataset of Art Sales in the UK in the 1800s¶
This data comes from the Getty Provenance Index, which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books.

Sir Anthony van Dyck - Portrait of Antoine Triest
Recall: you can open the raw .csv
files within Jupyter’s file system. From inside Jupyter, locate the CSV file from the File Browser on the left-hand side of the window. Double click to view as a formatted table, and right-click and select “Open With -> Editor” to view as an editable text file.
art = Table.read_table('data/UK_art_sales.csv')
art.show(5)
sale_code | lot_sale_year | lot_sale_month | lot_sale_day | auction_house | title | artist_name | nationality | object_type | price_amount | country_auth | pounds | shillings |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1839/02/15LOCH | 1839 | 2 | 16 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Dutch | Painting | 14-0 | England, UK | 14 | 0 |
1839/02/15LOCH | 1839 | 2 | 16 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Dutch | Painting | 13-2 | England, UK | 13 | 2 |
1837/07/04BBENR | 1837 | 7 | 12 | Enoch & Redfern | Portrait of a Gentleman, in black slashed dress, crimson ... | [Anonymous] | Unknown | Painting | 0-7 | England, UK | 0 | 7 |
1837/07/04BBENR | 1837 | 7 | 12 | Enoch & Redfern | Portrait (3/4 length) of a Young Lady in blue silk dress ... | [Anonymous] | Unknown | Painting | 0-13 | England, UK | 0 | 13 |
1837/07/04BBENR | 1837 | 7 | 12 | Enoch & Redfern | Portraits of Two Gentlemen and a Lady | [Anonymous] | Unknown | Painting | 0-3 | England, UK | 0 | 3 |
... (1817 rows omitted)
Remove the unhelpful columns (e.g. 'country_auth'
and 'sale_code'
) with the drop
method, and save the result table in art
variable.
art = art.drop('country_auth', 'sale_code', 'lot_sale_month', 'lot_sale_day', 'price_amount', 'shillings')
art.show(5)
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1839 | Christie's | A rich equipage halting on the bank of a river, where fi ... | K. du Jardin | Dutch | Painting | 14 |
1839 | Christie's | A breeze, with men-of-war and boats; a clear and beautif ... | Backhuysen | Dutch | Painting | 13 |
1837 | Enoch & Redfern | Portrait of a Gentleman, in black slashed dress, crimson ... | [Anonymous] | Unknown | Painting | 0 |
1837 | Enoch & Redfern | Portrait (3/4 length) of a Young Lady in blue silk dress ... | [Anonymous] | Unknown | Painting | 0 |
1837 | Enoch & Redfern | Portraits of Two Gentlemen and a Lady | [Anonymous] | Unknown | Painting | 0 |
... (1817 rows omitted)
Recall: method chaining let’s us combine multiple steps into a single line
art.sort('artist_name').show(4)
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1836 | Foster (Edward) | Landscape and Figures | A. Both | Dutch | Painting | 25 |
1840 | Sotheby's | Portrait of a Lady and Child, framed and glazed | A. Buck | British | Painting | 0 |
1805 | Christie's | The Madona and Child -- very fine | A. Caracci | Italian | Painting | 16 |
1805 | Christie's | Dead Christ and the Three Marys, a cabinet gem | A. Carracci | Italian | Painting | 39 |
... (1818 rows omitted)
Find non-painting objects using Table.where(...)
and the predicate are.not_equal_to()
.
not_a_painting = art.where('object_type', are.not_equal_to('Painting'))
not_a_painting
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1859 | Phillips (Harry) | A set of three finely modelled bronzes of the Venus de M ... | G. Zoffoli | Italian | Sculpture | 16 |
1859 | Phillips (Harry) | Another [finely modelled Old Florentine Bronze] of A Fem ... | Florentine | Italian | Sculpture | 22 |
1859 | Phillips (Harry) | Milton dictating his Paradise Lost. | Nash | British | Drawing | 51 |
1848 | Christie's | Gibraltar -- a print -- coloured; and an interior, after ... | Teniers | Flemish | Drawing | 0 |
1837 | Foster (Edward) | Water colour drawing, Falls of Tivoli. | Crome | British | Watercolor | 0 |
1836 | Foster (Edward) | Portrait of himself, in chalks, glazed, capital. | La Tour | French | Drawing | 56 |
Find items in which the artist’s nationality is Irish.
irish_artists = art.where('nationality', are.equal_to('Irish'))
irish_artists
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1836 | Foster (Edward) | Two Landscapes | O'Connor | Irish | Painting | 0 |
1836 | Foster (Edward) | View of Hastings and a Landscape | J.A. O'Connor | Irish | Painting | 0 |
1805 | Christie's | Lear with the Body of Cordelia -- grand and capital | Barry | Irish | Painting | 31 |
1805 | Abbott (William) | Portraits of Dr. Barron and Dr. Bentley, from Bishop New ... | Jarvis | Irish | Painting | 0 |

James Barry - Lear with the Body of Cordelia
2. Data Types¶
Type¶
Can ask for the type of a value or variable with the built-in Python function type
type(3)
int
temperature = 98.6
type(temperature)
float
prof_name = "Katie"
type(prof_name)
str
this_class_is_fun = True
type(this_class_is_fun)
bool
Floats¶
Some decisions made from Python. What type of value is produced by multiplying a float by an int?
answer = 0.75 * 2
answer
type(answer)
float
A computer cannot represent every real number exactly. That would require infinite memory because some numbers have an infinite number of digits.
1 / 3
0.3333333333333333
What happens when we run the next cell?
2 / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/2685369145.py in <module>
----> 1 2 / 0
ZeroDivisionError: division by zero
Scientific Notation¶
Represent some numbers as \(b \times 10^e\).
Examples:
1.23e5
is \(1.23 \times 10^5\).6.667e-07
is \(6.67 \times 10^{-7}\).
2 / 3000
0.0006666666666666666
2 / 3000000
6.666666666666667e-07
0.000000000000000123456789
1.23456789e-16
0.000000000000000000000000000000000000000000000000000000000000000000000123456789
1.23456789e-70
Rounding Errors¶
Since numbers aren’t always represented exactly, small errors may creap when we operated on floats. Too small for us to worry about in this class.
0.6666666666666666 - 0.6666666666666666123456789 # a little less than 0
0.0
2 ** 0.5
1.4142135623730951
2 ** 0.5 * 2 ** 0.5 # should be 2.0
2.0000000000000004
2 ** 0.5 * 2 ** 0.5 - 2 # should be 0
4.440892098500626e-16
Strings¶
String values capture text data (sequences of characters). Use single quotes or double quotes around strings.
'Painting'
'Painting'
"Painting"
'Painting'
Variables vs Strings¶
print("painting") # String value
painting = 4 # variable named painting
print(painting)
painting
4
Why both single and double quotes?
'Don't always use single quotes'
File "/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/2648065634.py", line 1
'Don't always use single quotes'
^
SyntaxError: invalid syntax
"Don't always use single quotes"
"Don't always use single quotes"
'cs' + '104' # concatenation
'cs104'
'cs' + ' ' + '104' # spaces aren't added for you
'cs 104'
Conversions¶
Can only concatenate multiple strings.
number = 104
'cs' + number
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/1095410427.py in <module>
1 number = 104
----> 2 'cs' + number
TypeError: can only concatenate str (not "int") to str
Convert numbers to strings when you want to use them to build larger strings.
'cs' + str(number)
'cs104'
Can convert from string back to numbers as well.
int('3')
3
float('3.0')
3.0
int(str(number))
104
3. Arrays¶
Array: sequence of values, all the same type, “boxed up”
Table operation: column
not_a_painting.column('pounds')
array([16, 22, 51, 0, 0, 56])
Arithmetic operations are broadcast
# Suppose the art auction house doubles the price for non-paintings
price_in_pounds = not_a_painting.column('pounds')
price_in_pounds * 2
array([ 32, 44, 102, 0, 0, 112])
price_in_pounds + 5
array([21, 27, 56, 5, 5, 61])
fives = make_array(5,10,15,20,25,30)
fives
array([ 5, 10, 15, 20, 25, 30])
price_in_pounds + fives
array([21, 32, 66, 20, 25, 86])
We can call other built-in Python functions on these arrays as well.
len(price_in_pounds)
6
max(price_in_pounds)
56
min(price_in_pounds)
0
sum(price_in_pounds)
145
np.mean(price_in_pounds)
24.166666666666668
price_in_pounds + make_array(1,2) #Error because not the same shapes
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/505079200.py in <module>
----> 1 price_in_pounds + make_array(1,2) #Error because not the same shapes
ValueError: operands could not be broadcast together with shapes (6,) (2,)
price_in_pounds + make_array(1,2, 3, 4, 5, 6)
array([17, 24, 54, 4, 5, 62])
Index into array to retrieve items. Indices start at 0.
price_in_pounds.item(0)
16
price_in_pounds.item(1)
22
price_in_pounds.item(3)
0
Think of item(n)
as asking for the item that has n
items before it.
Table operation: take
¶
pricey_art = art.sort('pounds', descending=True)
highest_price =pricey_art.take(0)
highest_price
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
second_highest_price = pricey_art.take(1)
second_highest_price
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1859 | Phillips (Harry) | A Landscape, with full length Portraits of Pierre Both, ... | Albert Cuyp | Dutch | Painting | 966 |
top_five = pricey_art.take(make_array(0, 1, 2, 3, 4))
top_five
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1859 | Phillips (Harry) | A Landscape, with full length Portraits of Pierre Both, ... | Albert Cuyp | Dutch | Painting | 966 |
1840 | Christie's | The Holy Family; a composition of four figures, as large ... | Rubens | Flemish | Painting | 945 |
1837 | Christie's | The Virgin, covering the sleeping Infant Jesus with a ve ... | Sebastian Del Piombo | Italian | Painting | 850 |
1859 | Phillips (Harry) | The Disgrace of Clarendon. | E.M. Ward, R.A. | British | Painting | 845 |
top_five.barh("artist_name", "pounds")

Creating ranges¶
What if I wanted the top 50? make_array(0,1,2,...,49)
? Ugh.
We can make an array for a range of numbers with np.arange(low,high)
, which gives us the integers in the range [low,high)
.
np.arange(0, 10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
top_ten = pricey_art.take(np.arange(0, 10))
top_ten
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1859 | Phillips (Harry) | A Landscape, with full length Portraits of Pierre Both, ... | Albert Cuyp | Dutch | Painting | 966 |
1840 | Christie's | The Holy Family; a composition of four figures, as large ... | Rubens | Flemish | Painting | 945 |
1837 | Christie's | The Virgin, covering the sleeping Infant Jesus with a ve ... | Sebastian Del Piombo | Italian | Painting | 850 |
1859 | Phillips (Harry) | The Disgrace of Clarendon. | E.M. Ward, R.A. | British | Painting | 845 |
1804 | Christie's | The Assumption of the Virgin. A Female (probably a Port ... | Palma il Gioven | Italian | Painting | 829 |
1840 | Christie's | Under the shade of some noble trees peasants are passing ... | Adrian Van De Velde | Dutch | Painting | 798 |
1845 | Christie's | St. John in the Desert. | Murillo | Spanish | Painting | 760 |
1840 | Christie's | The Magdalen in Contemplation; she is clad in a red, yel ... | Domenichino | Italian | Painting | 698 |
1838 | Christie's | The Assumption of the Virgin, the beautiful figure of th ... | Murillo | Spanish | Painting | 693 |
top_ten = pricey_art.take(np.arange(10))
top_ten
lot_sale_year | auction_house | title | artist_name | nationality | object_type | pounds |
---|---|---|---|---|---|---|
1804 | Coxe (Peter) | King Charles I. his Queen and Family, from the Orleans' ... | Vandyck | Flemish | Painting | 1575 |
1859 | Phillips (Harry) | A Landscape, with full length Portraits of Pierre Both, ... | Albert Cuyp | Dutch | Painting | 966 |
1840 | Christie's | The Holy Family; a composition of four figures, as large ... | Rubens | Flemish | Painting | 945 |
1837 | Christie's | The Virgin, covering the sleeping Infant Jesus with a ve ... | Sebastian Del Piombo | Italian | Painting | 850 |
1859 | Phillips (Harry) | The Disgrace of Clarendon. | E.M. Ward, R.A. | British | Painting | 845 |
1804 | Christie's | The Assumption of the Virgin. A Female (probably a Port ... | Palma il Gioven | Italian | Painting | 829 |
1840 | Christie's | Under the shade of some noble trees peasants are passing ... | Adrian Van De Velde | Dutch | Painting | 798 |
1845 | Christie's | St. John in the Desert. | Murillo | Spanish | Painting | 760 |
1840 | Christie's | The Magdalen in Contemplation; she is clad in a red, yel ... | Domenichino | Italian | Painting | 698 |
1838 | Christie's | The Assumption of the Virgin, the beautiful figure of th ... | Murillo | Spanish | Painting | 693 |
See other forms of ranges in book.