Data Types#

from datascience import *
from cs104 import *
import numpy as np

%matplotlib inline

1. Table Review: Art sales in the UK#

This data comes from the Getty Provenance Index, which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books.

https://upload.wikimedia.org/wikipedia/commons/d/d8/Sir_Anthony_van_Dyck_-_Portrait_of_Antoine_Triest%2C_Bishop_of_Ghent_%281576%E2%80%931655%29_-_BF.1977.2_-_Hermitage_Museum.jpg

Sir Anthony van Dyck - Portrait of Antoine Triest

Recall: you can open the raw .csv files within Jupyter’s file system. From inside Jupyter, locate the CSV file from the File Browser on the left-hand side of the window. Double click to view as a formatted table, and right-click and select “Open With -> Editor” to view as an editable text file.

art = Table().read_table('data/UK_art_subset.csv')
art.show(5)
lot_sale_year auction_house title artist_name object_type pounds
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Painting 14
1839 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Painting 13
1837 Enoch & Redfern A Mythological Subject, representing Apollo playing on a ... Rubens Painting 1
1803 Edwards (Edward) This picture represents an Alchemist with other Figures; ... Teniers Painting 18
1838 Christie's A dance of cupids; admirably coloured V. Dyck Painting 12

... (1098 rows omitted)

Review: Remove the unhelpful columns (e.g. 'auction_house') with the drop method, and save the result table in no_house variable.

no_house = art.drop('auction_house')
no_house.show(5)
lot_sale_year title artist_name object_type pounds
1839 A rich equipage halting on the bank of a river, where fi ... K. du Jardin Painting 14
1839 A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Painting 13
1837 A Mythological Subject, representing Apollo playing on a ... Rubens Painting 1
1803 This picture represents an Alchemist with other Figures; ... Teniers Painting 18
1838 A dance of cupids; admirably coloured V. Dyck Painting 12

... (1098 rows omitted)

art.sort('artist_name', descending=True).show(4)
lot_sale_year auction_house title artist_name object_type pounds
1839 Rainy (Alexander) The late Admiral Lord Exmouth, going into Action at Algi ... [Sir William Beechey] Painting 37
1839 Christie's The Infant placing a wreath of flowers on the head of th ... Zurbaran Painting 39
1805 Coxe (Peter) Ditto [Landscape and Figures], the Companion Zuccarelli Painting 14
1836 Christie's A landscape, with a waterfall and figures Zuccarelli Painting 17

... (1099 rows omitted)

Find non-painting objects using Table.where(...) and the predicate are.not_equal_to().

not_a_painting = art.where('object_type', are.not_equal_to('Painting'))
not_a_painting
lot_sale_year auction_house title artist_name object_type pounds
1859 Phillips (Harry) A set of three finely modelled bronzes of the Venus de M ... G. Zoffoli Sculpture 16
1859 Phillips (Harry) Another [finely modelled Old Florentine Bronze] of A Fem ... Florentine Sculpture 22
1859 Phillips (Harry) Milton dictating his Paradise Lost. Nash Drawing 51
1836 Foster (Edward) Portrait of himself, in chalks, glazed, capital. La Tour Drawing 56

Now, a quick recap of where and the predicate tests to select rows.

art.where("auction_house", are.equal_to("Christie's"))
lot_sale_year auction_house title artist_name object_type pounds
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Painting 14
1839 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Painting 13
1838 Christie's A dance of cupids; admirably coloured V. Dyck Painting 12
1838 Christie's Susannah & the elders Dietrich Painting 13
1838 Christie's A harbour-scene, with a royal yacht, vessels, boats, and ... Backhuyzen Painting 35
1838 Christie's Interior of a Flemish cathedral, with figures by Francks P. Neefs Painting 25
1838 Christie's A river-scene, with buildings and figures; a brilliant s ... V. der Neer Painting 23
1838 Christie's A Dutch river near a town, with vessels and figures -- b ... V. der Neer Painting 53
1838 Christie's A mountainous landscape, with classical figures; circular G. Poussin Painting 13
1838 Christie's A cavalier seated smoking, in conversation with a female ... Mieris Painting 28

... (776 rows omitted)

art.where("pounds", are.above(900))
lot_sale_year auction_house title artist_name object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Painting 1575
1859 Phillips (Harry) A Landscape, with full length Portraits of Pierre Both, ... Albert Cuyp Painting 966
1840 Christie's The Holy Family; a composition of four figures, as large ... Rubens Painting 945
art.where("lot_sale_year", are.between(1815, 1835))
lot_sale_year auction_house title artist_name object_type pounds
1823 Christie's The Four Seasons, exemplified in Four beautiful small Ca ... David Teniers Painting 189
1819 Christie's A pair Historical, Edward and Eleonora, and companion, v ... A. Kauffman Painting 48
art.where("title", are.containing('river'))
lot_sale_year auction_house title artist_name object_type pounds
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Painting 14
1838 Christie's A river-scene, with buildings and figures; a brilliant s ... V. der Neer Painting 23
1838 Christie's A Dutch river near a town, with vessels and figures -- b ... V. der Neer Painting 53
1838 Christie's A Grand Italian Landscape, with buildings near a river; ... Berghem Painting 164
1840 Christie's The manege; numerous figures near a stable, and distant ... J. Ostade Painting 26
1849 Christie's A grand landscape, with a river, with a cavalier and gyp ... Van Uden Painting 18
1836 Christie's A brown horse standing near a river, in a landscape, wit ... Cuyp Painting 28
1838 Christie's A Flemish town on fire on the bank of the river, moonlight V. der Neer Painting 45
1838 Christie's A windmill and cottages on the bank of a river, with boa ... Ostade Painting 22
1836 Christie's Landscape, with a river; on the perforated rocky bank of ... Claude Painting 98

... (21 rows omitted)

What else can we learn from these data sets?#

Think-pair-share: Display the most expensive items sold after 1850?

Recall: method chaining let’s us combine multiple steps into a single line

art.where('lot_sale_year', are.above(1850)).sort('pounds', descending=True).show(5)
lot_sale_year auction_house title artist_name object_type pounds
1859 Phillips (Harry) A Landscape, with full length Portraits of Pierre Both, ... Albert Cuyp Painting 966
1859 Phillips (Harry) The Disgrace of Clarendon. E.M. Ward, R.A. Painting 845
1859 Phillips (Harry) View of Windsor Castle. The celebrated picture pain ... Patrick Nasmyth Painting 588
1859 Phillips (Harry) The Wood. Nymph chanting her Hymn to the Rising Sun. J. Danby, R.A. Painting 378
1859 Phillips (Harry) L'Umana Fragilita. A strange but wonderfully imagina ... Salvator Rosa Painting 346

... (39 rows omitted)

How much is £966 in 1859 in today’s USD?

The pound had an average inflation of 3.15% per year, meaning it is around £155,000 today.

pounds_2023 = 155000
dollars_2023 = pounds_2023 * 1.25
dollars_2023
193750.0

2. Data Types#

Type#

Can ask for the type of a value or variable with the built-in Python function type

type(3)
int
temperature = 98.6
type(temperature)
float
prof_name = "Katie"
type(prof_name)
str
this_class_is_fun = True
type(this_class_is_fun)
bool

Floats#

Some decisions made from Python. What type of value is produced by multiplying a float by an int?

answer = 0.75 * 2
answer
type(answer)
float

A computer cannot represent every real number exactly. That would require infinite memory because some numbers have an infinite number of digits.

1 / 3
0.3333333333333333

What happens when we run the next cell?

# 2 / 0

Scientific Notation#

Represent some numbers as \(b \times 10^e\).

Examples:

  • 1.23e5 is \(1.23 \times 10^5\).

  • 6.667e-07 is \(6.67 \times 10^{-7}\).

2 / 3000
0.0006666666666666666
2 / 3000000
6.666666666666667e-07
0.000000000000000123456789
1.23456789e-16
0.000000000000000000000000000000000000000000000000000000000000000000000123456789
1.23456789e-70

Rounding Errors#

Since numbers aren’t always represented exactly, small errors may creap when we operated on floats. Too small for us to worry about in this class.

0.6666666666666666 - 0.6666666666666666123456789 # a little less than 0 
0.0
2 ** 0.5
1.4142135623730951
2 ** 0.5 * 2 ** 0.5 # should be 2.0 
2.0000000000000004
2 ** 0.5 * 2 ** 0.5 - 2 # should be 0 
4.440892098500626e-16

Strings#

String values capture text data (sequences of characters). Use single quotes or double quotes around strings.

'Painting'
'Painting'
"Painting"
'Painting'

Variables vs Strings#

print("painting") # String value

painting = 4      # variable named painting
print(painting)
painting
4

Why both single and double quotes?

'Don't always use single quotes'
  Cell In[30], line 1
    'Don't always use single quotes'
         ^
SyntaxError: invalid syntax
"Don't always use single quotes"
"Don't always use single quotes"
'cs' + '104' # concatenation
'cs104'
'cs' + ' ' +  '104' # spaces aren't added for you
'cs 104'

Conversions#

Can only concatenate multiple strings.

number = 104
'cs' + number
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[34], line 2
      1 number = 104
----> 2 'cs' + number

TypeError: can only concatenate str (not "int") to str

Convert numbers to strings when you want to use them to build larger strings.

'cs' + str(number)
'cs104'

Can convert from string back to numbers as well.

int('3')
3
float('3.0')
3.0
int(str(number)) * 2
208

3. Arrays#

Array: sequence of values, all the same type, “boxed up”

Table operation: column

not_a_painting.column('pounds')
array([16, 22, 51, 56])

Arithmetic operations are broadcast on arrays.

What’s the price in dollars for each of these items?

price_in_pounds = not_a_painting.column('pounds')
price_in_dollars = price_in_pounds * 1.25
price_in_dollars
array([20.  , 27.5 , 63.75, 70.  ])

Suppose the art auction house adds 5 pounds to each item’s price.

price_in_pounds + 5
array([21, 27, 56, 61])
fives = make_array(5,10,15,20)
fives
array([ 5, 10, 15, 20])
price_in_pounds + fives
array([21, 32, 66, 76])

We can call other built-in Python functions on these arrays as well.

len(price_in_pounds)
4
max(price_in_pounds)
56
min(price_in_pounds)
16
sum(price_in_pounds)
145
np.mean(price_in_pounds)
36.25
price_in_pounds + make_array(1,2) #Error because not the same shapes 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[49], line 1
----> 1 price_in_pounds + make_array(1,2) #Error because not the same shapes 

ValueError: operands could not be broadcast together with shapes (4,) (2,) 
price_in_pounds + make_array(1,2, 3, 4)
array([17, 24, 54, 60])

Index into array to retrieve items. Indices start at 0.

price_in_pounds.item(0)
16
price_in_pounds.item(1)
22
price_in_pounds.item(3)
56

Think of item(n) as asking for the item that has n items before it.

The price of the most expensive piece of art sold:

top_price = art.sort('pounds', descending=True).column('pounds').item(0)
top_price
1575