Dictionaries & Sets
Contents
Dictionaries & Sets¶
Today, we will discuss the following:
Discuss dictionaries in more detail
Learn about sets (another unordered collection)
Dictionary Comprehensions¶
Similar to list comphrehensions!
calendar = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}
days30 = {month: calendar[month] for month in calendar if month[0] == 'J'}
days30
{'Jan': 31, 'Jun': 30, 'Jul': 31}
Advantages of Storing Unordered Data as A Dictionary¶
So what’s the big deal about dictionaries? Let’s examine the benefit of using the dictionary for storing Scrabble scores as opposed to using a list of tuples or two separate lists. Suppose we have a dictionary corresponding to key-value pairs of letters and their associated score in the board game Scrabble.
scrabbleScore = {'a':1 , 'b':3, 'c':3, 'd':2, 'e':1,
'f':4, 'g':2, 'h':4, 'i':1, 'j':8,
'k':5, 'l':1, 'm':3, 'n':1, 'o':1,
'p':3, 'q':10, 'r':1, 's':1, 't':1,
'u':1, 'v':8, 'w':4, 'x':8, 'y':4, 'z':10}
# random letters to query several times
from time import time
randomLetters = ['a', 'l', 'q', 's', 'y', 'z']*1000000
print("Number of queries", len(randomLetters))
Number of queries 6000000
# generate list of letters and scores
letters = list(scrabbleScore.keys())
scores = list(scrabbleScore.values())
print(scores)
[1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 1, 1, 1, 8, 4, 8, 4, 10]
# time using list operations to compute total score
startTime = time()
totalScore = 0
for query in randomLetters:
index = letters.index(query)
totalScore += scores[index]
endTime = time()
timeList = endTime - startTime
print("Time taken using a list", round(timeList, 3), "seconds")
Time taken using a list 1.652 seconds
# time using dictionaries to compute total score
startTime = time()
totalScore = 0
for query in randomLetters:
totalScore += scrabbleScore[query]
endTime = time()
timeDict = endTime - startTime
print("Time taken using a dictionary", round(timeDict, 3), "seconds")
Time taken using a dictionary 0.519 seconds
Even in this simple example dictionaries offer a 4x speed-up!
Sorting Operations with Dictionaries¶
Let’s say we have a dictionary corresponding to key-value pairs of letters and their associated score in the board game Scrabble.
scrabbleScore = {'a':1 , 'b':3, 'c':3, 'd':2, 'e':1,
'f':4, 'g':2, 'h':4, 'i':1, 'j':8,
'k':5, 'l':1, 'm':3, 'n':1, 'o':1,
'p':3, 'q':10, 'r':1, 's':1, 't':1,
'u':1, 'v':8, 'w':4, 'x':8, 'y':4, 'z':10}
By default, calling the sorted function on a dictionary will return a sorted list of keys.
print(sorted(scrabbleScore, reverse=True))
['z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm', 'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
But the above sorting behavior isn’t super interesting in this Scrabble example. Maybe we’d like to get an ordering based on the values (scores) of the letters instead. We can use ideas we’ve learned regarding key
functions and tuples
to help us!
def getScrabbleScore(letterScoreTuple):
"""
Takes a tuple corresponding to (letter, score) and returns the score
"""
return letterScoreTuple[1]
# first use the items method to get a list of (key, value) tuples
# and then sort using a key function
scrabbleItems = list(scrabbleScore.items())
print(scrabbleItems)
[('a', 1), ('b', 3), ('c', 3), ('d', 2), ('e', 1), ('f', 4), ('g', 2), ('h', 4), ('i', 1), ('j', 8), ('k', 5), ('l', 1), ('m', 3), ('n', 1), ('o', 1), ('p', 3), ('q', 10), ('r', 1), ('s', 1), ('t', 1), ('u', 1), ('v', 8), ('w', 4), ('x', 8), ('y', 4), ('z', 10)]
sortedScrabbleItems = sorted(scrabbleItems, key=getScrabbleScore, reverse=True)
print(sortedScrabbleItems)
#print(sortedScrabbleItems[0:3], '...', sortedScrabbleItems[-3:])
[('q', 10), ('z', 10), ('j', 8), ('v', 8), ('x', 8), ('k', 5), ('f', 4), ('h', 4), ('w', 4), ('y', 4), ('b', 3), ('c', 3), ('m', 3), ('p', 3), ('d', 2), ('g', 2), ('a', 1), ('e', 1), ('i', 1), ('l', 1), ('n', 1), ('o', 1), ('r', 1), ('s', 1), ('t', 1), ('u', 1)]
We can further use a list comprehension to just get the letters from these tuples. Exercise: What would that look like?
New Mutable Collection: Sets¶
In Python, a set is a mutable, unordered and unique collection of immutable objects.
Syntax. Nonempty sets can be written directly as comma-separated elements delimited by curly braces. The empty set is written set()
rather than {}
, because {}
means an empty dictionary in Python.
nums = {42, 17, 8, 57, 23}
flowers = {'tulips', 'daffodils', 'asters', 'daisies'}
peanuts = {('Charlie', 'Brown'), ('Lucy', 'Van Pelt'), ('Peppermint', 'Patty')}
emptySet = set() # empty set
Removing duplicates. Like dictionaries, sets cannot have duplicate values, which is why they are a handy way to remove duplicates from sequences.
firstChoice = ['a', 'b', 'a', 'a', 'b', 'c']
uniques = set(firstChoice)
list(uniques)
['b', 'c', 'a']
list(set("aabrakadabra"))
['r', 'a', 'k', 'd', 'b']
Question. What can be potential downside of this approach, compared to the uniques()
and candidates()
helper functions we used in Lab 3 and 4?
We lose ordering info.
Checking membership. We can use the in
operator to test membership in sets, similar to lists, dictionaries, and tuples.
nums = {42, 17, 8, 57, 23}
flowers = {'tulips', 'daffodils', 'asters', 'daisies'}
16 in nums
False
'asters' in flowers
True
len(flowers)
4
# iterable
for f in flowers:
print(f, end=" ")
asters daffodils daisies tulips
Note. Jupyter notebook displays sets in sorted order, but they do not inherently have any order. Printing them will lead to an unpredictable order.
print(flowers)
{'asters', 'daffodils', 'daisies', 'tulips'}
len(nums)
5
type(flowers)
set
Sets are mutable.
We can use the .add()
, .remove()
set methods to add and remove values from sets.
# add items
flowers.add('carnations')
flowers
{'asters', 'carnations', 'daffodils', 'daisies', 'tulips'}
flowers.remove('tulips')
flowers
{'asters', 'carnations', 'daffodils', 'daisies'}
Sets are unordered. Because sets are unordered, we cannot index into them, or concatenate them together.
# will this work?
flowers[1]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [27], in <cell line: 2>()
1 # will this work?
----> 2 flowers[1]
TypeError: 'set' object is not subscriptable
# will this work?
flowers + {'lilies'}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [28], in <cell line: 2>()
1 # will this work?
----> 2 flowers + {'lilies'}
TypeError: unsupported operand type(s) for +: 'set' and 'set'
Immutable objects. The item of a set must be immutable. Since lists, dictionaries, and sets themselves are mutable, they cannot be items of a set.
{[3, 2], [1, 5, 4]}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 {[3, 2], [1, 5, 4]}
TypeError: unhashable type: 'list'
Dictionary of Dictionaries¶
Sometimes we may have the need for more complex data structures: a dictionary of dictionaries.
Peanuts Roster¶
# fname, lname, Yr Status,Username,Anonymous ID,Lecture
peanutsDict = {}
with open("peanuts.csv") as roster:
for character in roster:
unix, name, role, age, icecream = character.strip().split(',')
peanutsDict[unix] = {"name" : name, "role" : role,
"age" : age, "icecream": icecream}
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [30], in <cell line: 3>()
1 # fname, lname, Yr Status,Username,Anonymous ID,Lecture
2 peanutsDict = {}
----> 3 with open("peanuts.csv") as roster:
4 for character in roster:
5 unix, name, role, age, icecream = character.strip().split(',')
FileNotFoundError: [Errno 2] No such file or directory: 'peanuts.csv'
peanutsDict
{}
peanutsDict["ss32"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [32], in <cell line: 1>()
----> 1 peanutsDict["ss32"]
KeyError: 'ss32'
sorted(peanutsDict)[:10]
[]
An Aside: Dictionary, Sets: Hashes¶
Python stores keys of a dictionary and items in a set as hash values, generated by the hash function. This is why dictionaries are also known as hashtables, especially in other programming languages. Only mutable objects have hash values. You can use the built-in function hash()
to explore these values.
myDict = {[3, 4]: "abcd"}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [34], in <cell line: 1>()
----> 1 myDict = {[3, 4]: "abcd"}
TypeError: unhashable type: 'list'
mySet = {{2, 3}, {3, 4}}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [35], in <cell line: 1>()
----> 1 mySet = {{2, 3}, {3, 4}}
TypeError: unhashable type: 'set'
hash("Williams")
1316459169320445476
hash(("Ephelia", 25))
7027448810232994889