Data Types

# Some code to set up our notebook for data science!

from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=np.VisibleDeprecationWarning)

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

1. Table Review: Dataset of Art Sales in the UK in the 1800s

This data comes from the Getty Provenance Index, which currently contains more than 2.3 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books.

https://upload.wikimedia.org/wikipedia/commons/d/d8/Sir_Anthony_van_Dyck_-_Portrait_of_Antoine_Triest%2C_Bishop_of_Ghent_%281576%E2%80%931655%29_-_BF.1977.2_-_Hermitage_Museum.jpg

Sir Anthony van Dyck - Portrait of Antoine Triest

Recall: you can open the raw .csv files within Jupyter’s file system. From inside Jupyter, locate the CSV file from the File Browser on the left-hand side of the window. Double click to view as a formatted table, and right-click and select “Open With -> Editor” to view as an editable text file.

art = Table.read_table('data/UK_art_sales.csv')
art.show(5)
sale_code lot_sale_year lot_sale_month lot_sale_day auction_house title artist_name nationality object_type price_amount country_auth pounds shillings
1839/02/15LOCH 1839 2 16 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Dutch Painting 14-0 England, UK 14 0
1839/02/15LOCH 1839 2 16 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Dutch Painting 13-2 England, UK 13 2
1837/07/04BBENR 1837 7 12 Enoch & Redfern Portrait of a Gentleman, in black slashed dress, crimson ... [Anonymous] Unknown Painting 0-7 England, UK 0 7
1837/07/04BBENR 1837 7 12 Enoch & Redfern Portrait (3/4 length) of a Young Lady in blue silk dress ... [Anonymous] Unknown Painting 0-13 England, UK 0 13
1837/07/04BBENR 1837 7 12 Enoch & Redfern Portraits of Two Gentlemen and a Lady [Anonymous] Unknown Painting 0-3 England, UK 0 3

... (1817 rows omitted)

Remove the unhelpful columns (e.g. 'country_auth' and 'sale_code') with the drop method, and save the result table in art variable.

art = art.drop('country_auth', 'sale_code', 'lot_sale_month', 'lot_sale_day', 'price_amount', 'shillings')
art.show(5)
lot_sale_year auction_house title artist_name nationality object_type pounds
1839 Christie's A rich equipage halting on the bank of a river, where fi ... K. du Jardin Dutch Painting 14
1839 Christie's A breeze, with men-of-war and boats; a clear and beautif ... Backhuysen Dutch Painting 13
1837 Enoch & Redfern Portrait of a Gentleman, in black slashed dress, crimson ... [Anonymous] Unknown Painting 0
1837 Enoch & Redfern Portrait (3/4 length) of a Young Lady in blue silk dress ... [Anonymous] Unknown Painting 0
1837 Enoch & Redfern Portraits of Two Gentlemen and a Lady [Anonymous] Unknown Painting 0

... (1817 rows omitted)

Recall: method chaining let’s us combine multiple steps into a single line

art.sort('artist_name').show(4)
lot_sale_year auction_house title artist_name nationality object_type pounds
1836 Foster (Edward) Landscape and Figures A. Both Dutch Painting 25
1840 Sotheby's Portrait of a Lady and Child, framed and glazed A. Buck British Painting 0
1805 Christie's The Madona and Child -- very fine A. Caracci Italian Painting 16
1805 Christie's Dead Christ and the Three Marys, a cabinet gem A. Carracci Italian Painting 39

... (1818 rows omitted)

Find non-painting objects using Table.where(...) and the predicate are.not_equal_to().

not_a_painting = art.where('object_type', are.not_equal_to('Painting'))
not_a_painting
lot_sale_year auction_house title artist_name nationality object_type pounds
1859 Phillips (Harry) A set of three finely modelled bronzes of the Venus de M ... G. Zoffoli Italian Sculpture 16
1859 Phillips (Harry) Another [finely modelled Old Florentine Bronze] of A Fem ... Florentine Italian Sculpture 22
1859 Phillips (Harry) Milton dictating his Paradise Lost. Nash British Drawing 51
1848 Christie's Gibraltar -- a print -- coloured; and an interior, after ... Teniers Flemish Drawing 0
1837 Foster (Edward) Water colour drawing, Falls of Tivoli. Crome British Watercolor 0
1836 Foster (Edward) Portrait of himself, in chalks, glazed, capital. La Tour French Drawing 56

Find items in which the artist’s nationality is Irish.

irish_artists = art.where('nationality', are.equal_to('Irish'))
irish_artists
lot_sale_year auction_house title artist_name nationality object_type pounds
1836 Foster (Edward) Two Landscapes O'Connor Irish Painting 0
1836 Foster (Edward) View of Hastings and a Landscape J.A. O'Connor Irish Painting 0
1805 Christie's Lear with the Body of Cordelia -- grand and capital Barry Irish Painting 31
1805 Abbott (William) Portraits of Dr. Barron and Dr. Bentley, from Bishop New ... Jarvis Irish Painting 0
https://media.tate.org.uk/art/images/work/T/T00/T00556_10.jpg

James Barry - Lear with the Body of Cordelia

2. Data Types

Type

Can ask for the type of a value or variable with the built-in Python function type

type(3)
int
temperature = 98.6
type(temperature)
float
prof_name = "Katie"
type(prof_name)
str
this_class_is_fun = True
type(this_class_is_fun)
bool

Floats

Some decisions made from Python. What type of value is produced by multiplying a float by an int?

answer = 0.75 * 2
answer
type(answer)
float

A computer cannot represent every real number exactly. That would require infinite memory because some numbers have an infinite number of digits.

1 / 3
0.3333333333333333

What happens when we run the next cell?

2 / 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/2685369145.py in <module>
----> 1 2 / 0

ZeroDivisionError: division by zero

Scientific Notation

Represent some numbers as \(b \times 10^e\).

Examples:

  • 1.23e5 is \(1.23 \times 10^5\).

  • 6.667e-07 is \(6.67 \times 10^{-7}\).

2 / 3000
0.0006666666666666666
2 / 3000000
6.666666666666667e-07
0.000000000000000123456789
1.23456789e-16
0.000000000000000000000000000000000000000000000000000000000000000000000123456789
1.23456789e-70

Rounding Errors

Since numbers aren’t always represented exactly, small errors may creap when we operated on floats. Too small for us to worry about in this class.

0.6666666666666666 - 0.6666666666666666123456789 # a little less than 0 
0.0
2 ** 0.5
1.4142135623730951
2 ** 0.5 * 2 ** 0.5 # should be 2.0 
2.0000000000000004
2 ** 0.5 * 2 ** 0.5 - 2 # should be 0 
4.440892098500626e-16

Strings

String values capture text data (sequences of characters). Use single quotes or double quotes around strings.

'Painting'
'Painting'
"Painting"
'Painting'

Variables vs Strings

print("painting") # String value

painting = 4      # variable named painting
print(painting)
painting
4

Why both single and double quotes?

'Don't always use single quotes'
  File "/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/2648065634.py", line 1
    'Don't always use single quotes'
         ^
SyntaxError: invalid syntax
"Don't always use single quotes"
"Don't always use single quotes"
'cs' + '104' # concatenation
'cs104'
'cs' + ' ' +  '104' # spaces aren't added for you
'cs 104'

Conversions

Can only concatenate multiple strings.

number = 104
'cs' + number
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/1095410427.py in <module>
      1 number = 104
----> 2 'cs' + number

TypeError: can only concatenate str (not "int") to str

Convert numbers to strings when you want to use them to build larger strings.

'cs' + str(number)
'cs104'

Can convert from string back to numbers as well.

int('3')
3
float('3.0')
3.0
int(str(number))
104

3. Arrays

Array: sequence of values, all the same type, “boxed up”

Table operation: column

not_a_painting.column('pounds')
array([16, 22, 51,  0,  0, 56])

Arithmetic operations are broadcast

# Suppose the art auction house doubles the price for non-paintings 
price_in_pounds = not_a_painting.column('pounds')
price_in_pounds * 2
array([ 32,  44, 102,   0,   0, 112])
price_in_pounds + 5
array([21, 27, 56,  5,  5, 61])
fives = make_array(5,10,15,20,25,30)
fives
array([ 5, 10, 15, 20, 25, 30])
price_in_pounds + fives
array([21, 32, 66, 20, 25, 86])

We can call other built-in Python functions on these arrays as well.

len(price_in_pounds)
6
max(price_in_pounds)
56
min(price_in_pounds)
0
sum(price_in_pounds)
145
np.mean(price_in_pounds)
24.166666666666668
price_in_pounds + make_array(1,2) #Error because not the same shapes 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_40240/505079200.py in <module>
----> 1 price_in_pounds + make_array(1,2) #Error because not the same shapes

ValueError: operands could not be broadcast together with shapes (6,) (2,) 
price_in_pounds + make_array(1,2, 3, 4, 5, 6)
array([17, 24, 54,  4,  5, 62])

Index into array to retrieve items. Indices start at 0.

price_in_pounds.item(0)
16
price_in_pounds.item(1)
22
price_in_pounds.item(3)
0

Think of item(n) as asking for the item that has n items before it.

Table operation: take

pricey_art = art.sort('pounds', descending=True)
highest_price =pricey_art.take(0)
highest_price
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
second_highest_price = pricey_art.take(1)
second_highest_price
lot_sale_year auction_house title artist_name nationality object_type pounds
1859 Phillips (Harry) A Landscape, with full length Portraits of Pierre Both, ... Albert Cuyp Dutch Painting 966
top_five = pricey_art.take(make_array(0, 1, 2, 3, 4))
top_five
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1859 Phillips (Harry) A Landscape, with full length Portraits of Pierre Both, ... Albert Cuyp Dutch Painting 966
1840 Christie's The Holy Family; a composition of four figures, as large ... Rubens Flemish Painting 945
1837 Christie's The Virgin, covering the sleeping Infant Jesus with a ve ... Sebastian Del Piombo Italian Painting 850
1859 Phillips (Harry) The Disgrace of Clarendon. E.M. Ward, R.A. British Painting 845
top_five.barh("artist_name", "pounds")
../_images/04-data-types_85_0.png

Creating ranges

What if I wanted the top 50? make_array(0,1,2,...,49)? Ugh. We can make an array for a range of numbers with np.arange(low,high), which gives us the integers in the range [low,high).

np.arange(0, 10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
top_ten = pricey_art.take(np.arange(0, 10))
top_ten
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1859 Phillips (Harry) A Landscape, with full length Portraits of Pierre Both, ... Albert Cuyp Dutch Painting 966
1840 Christie's The Holy Family; a composition of four figures, as large ... Rubens Flemish Painting 945
1837 Christie's The Virgin, covering the sleeping Infant Jesus with a ve ... Sebastian Del Piombo Italian Painting 850
1859 Phillips (Harry) The Disgrace of Clarendon. E.M. Ward, R.A. British Painting 845
1804 Christie's The Assumption of the Virgin. A Female (probably a Port ... Palma il Gioven Italian Painting 829
1840 Christie's Under the shade of some noble trees peasants are passing ... Adrian Van De Velde Dutch Painting 798
1845 Christie's St. John in the Desert. Murillo Spanish Painting 760
1840 Christie's The Magdalen in Contemplation; she is clad in a red, yel ... Domenichino Italian Painting 698
1838 Christie's The Assumption of the Virgin, the beautiful figure of th ... Murillo Spanish Painting 693
top_ten = pricey_art.take(np.arange(10))
top_ten
lot_sale_year auction_house title artist_name nationality object_type pounds
1804 Coxe (Peter) King Charles I. his Queen and Family, from the Orleans' ... Vandyck Flemish Painting 1575
1859 Phillips (Harry) A Landscape, with full length Portraits of Pierre Both, ... Albert Cuyp Dutch Painting 966
1840 Christie's The Holy Family; a composition of four figures, as large ... Rubens Flemish Painting 945
1837 Christie's The Virgin, covering the sleeping Infant Jesus with a ve ... Sebastian Del Piombo Italian Painting 850
1859 Phillips (Harry) The Disgrace of Clarendon. E.M. Ward, R.A. British Painting 845
1804 Christie's The Assumption of the Virgin. A Female (probably a Port ... Palma il Gioven Italian Painting 829
1840 Christie's Under the shade of some noble trees peasants are passing ... Adrian Van De Velde Dutch Painting 798
1845 Christie's St. John in the Desert. Murillo Spanish Painting 760
1840 Christie's The Magdalen in Contemplation; she is clad in a red, yel ... Domenichino Italian Painting 698
1838 Christie's The Assumption of the Virgin, the beautiful figure of th ... Murillo Spanish Painting 693

See other forms of ranges in book.