Dictionaries & Sets

Today, we will discuss the following:

  • Discuss dictionaries in more detail

  • Learn about sets (another unordered collection)

Dictionary Comprehensions

Similar to list comphrehensions!

calendar = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
            'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
            'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31} 

days30 = {month: calendar[month] for month in calendar if month[0] == 'J'}
days30
{'Jan': 31, 'Jun': 30, 'Jul': 31}

Sorting Operations with Dictionaries

Let’s say we have a dictionary corresponding to key-value pairs of letters and their associated score in the board game Scrabble.

scrabbleScore = {'a':1 , 'b':3, 'c':3, 'd':2, 'e':1, 
                 'f':4, 'g':2, 'h':4, 'i':1, 'j':8, 
                 'k':5, 'l':1, 'm':3, 'n':1, 'o':1, 
                 'p':3, 'q':10, 'r':1, 's':1, 't':1, 
                 'u':1, 'v':8, 'w':4, 'x':8, 'y':4, 'z': 10} 

By default, calling the sorted function on a dictionary will return a sorted list of keys.

print(sorted(scrabbleScore))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

But the above sorting behavior isn’t super interesting in this Scrabble example. Maybe we’d like to get an ordering based on the values (scores) of the letters instead. We can use ideas we’ve learned regarding key functions and tuples to help us!

def getScrabbleScore(letterScoreTuple):
    """
    Takes a tuple corresponding to (letter, score) and returns the score
    """
    return letterScoreTuple[1]


# first use the items method to get a list of (key, value) tuples
# and then sort using a key function
scrabbleItems = scrabbleScore.items()
scrabbleItems
dict_items([('a', 1), ('b', 3), ('c', 3), ('d', 2), ('e', 1), ('f', 4), ('g', 2), ('h', 4), ('i', 1), ('j', 8), ('k', 5), ('l', 1), ('m', 3), ('n', 1), ('o', 1), ('p', 3), ('q', 10), ('r', 1), ('s', 1), ('t', 1), ('u', 1), ('v', 8), ('w', 4), ('x', 8), ('y', 4), ('z', 10)])
sortedScrabbleItems = sorted(scrabbleItems, key=getScrabbleScore, reverse=True)
print(sortedScrabbleItems)
#print(sortedScrabbleItems[0:3], '...', sortedScrabbleItems[-3:])
[('q', 10), ('z', 10), ('j', 8), ('v', 8), ('x', 8), ('k', 5), ('f', 4), ('h', 4), ('w', 4), ('y', 4), ('b', 3), ('c', 3), ('m', 3), ('p', 3), ('d', 2), ('g', 2), ('a', 1), ('e', 1), ('i', 1), ('l', 1), ('n', 1), ('o', 1), ('r', 1), ('s', 1), ('t', 1), ('u', 1)]

We can further use a list comprehension to just get the letters from these tuples. Exercise: What would that look like?

Advantages of Storing Unordered Data as A Dictionary

So what’s the big deal about dictionaries? Let’s examine the benefit of using the dictionary for storing Scrabble scores as opposed to using a list of tuples or two separate lists.

# random letters to query several times
import time
randomLetters = ['a', 'l', 'q', 's', 'y', 'z']*1000000
print("Number of queries", len(randomLetters))
Number of queries 6000000
# generate list of letters and scores
letters = list(scrabbleScore.keys())
scores = list(scrabbleScore.values())
scores
[1,
 3,
 3,
 2,
 1,
 4,
 2,
 4,
 1,
 8,
 5,
 1,
 3,
 1,
 1,
 3,
 10,
 1,
 1,
 1,
 1,
 8,
 4,
 8,
 4,
 10]
# time using list operations to compute total score
startTime = time.time()
totalScore = 0

for query in randomLetters:
    index = letters.index(query)
    totalScore += scores[index]

endTime = time.time()
timeList = endTime - startTime
print("Time taken using a list", round(timeList, 3), "seconds")
Time taken using a list 2.305 seconds
# time using dictionaries to compute total score
startTime = time.time()
totalScore = 0

for query in randomLetters: 
    totalScore += scrabbleScore[query]

endTime = time.time()
timeDict = endTime - startTime
print("Time taken using a dictionary", round(timeDict, 3), "seconds")
Time taken using a dictionary 0.599 seconds

Even in this simple example dictionaries offer a 4x speed-up!

New Mutable Collection: Sets

In Python, a set is a mutable, unordered and unique collection of immutable objects.

Syntax. Nonempty sets can be written directly as comma-separated elements delimited by curly braces. The empty set is written set() rather than {}, because {} means an empty dictionary in Python.

nums = {42, 17, 8, 57, 23}
flowers = {'tulips', 'daffodils', 'asters', 'daisies'}
potters = {('Ron', 'Weasley'), ('Luna', 'Lovegood'), ('Hermione', 'Granger')}
emptySet = set() # empty set

Removing duplicates. Like dictionaries, sets cannot have duplicate values, which is why they are a handy way to remove duplicates from sequences.

firstChoice = ['a', 'b', 'a', 'a', 'b', 'c']
uniques = set(firstChoice)
list(uniques)
['b', 'a', 'c']
list(set("aabrakadabra"))
['k', 'r', 'd', 'a', 'b']

Question. What can be potential downside of this approach, compared to the uniques() and candidates() helper functions we used in Lab 3 and 4?

We lose ordering info.

Checking membership. We can use the in operator to test membership in sets, similar to lists, dictionaries, and tuples.

nums = {42, 17, 8, 57, 23}
flowers = {'tulips', 'daffodils', 'asters', 'daisies'}
16 in nums
False
'asters' in flowers
True
len(flowers)
4
# iterable 
for f in flowers:
    print(f, end=" ")
daffodils daisies asters tulips 

Note. Jupyter notebook displays sets in sorted order, but they do not inherently have any order. Printing them will lead to an unpredictable order.

print(flowers)
{'daffodils', 'daisies', 'asters', 'tulips'}
len(nums)
5
type(potters)
set

Sets are mutable.

We can use the .add(), .remove() set methods to add and remove values from sets.

# add items
flowers.add('carnations')
flowers
{'asters', 'carnations', 'daffodils', 'daisies', 'tulips'}
flowers.remove('tulips')
flowers
{'asters', 'carnations', 'daffodils', 'daisies'}

Sets are unordered. Because sets are unordered, we cannot index into them, or concatenate them together.

# will this work?
flowers[1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/3244342521.py in <module>
      1 # will this work?
----> 2 flowers[1]

TypeError: 'set' object is not subscriptable
# will this work?
flowers + {'lilies'}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/2974189839.py in <module>
      1 # will this work?
----> 2 flowers + {'lilies'}

TypeError: unsupported operand type(s) for +: 'set' and 'set'

Immutable objects. The item of a set must be immutable. Since lists, dictionaries, and sets themselves are mutable, they cannot be items of a set.

{[3, 2], [1, 5, 4]}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/3548805500.py in <module>
----> 1 {[3, 2], [1, 5, 4]}

TypeError: unhashable type: 'list'

Dictionary of Dictionaries

Sometimes we may have the need for more complex data structures: a dictionary of dictionaries.

Harry Potter Roster

# fname, lname, Yr Status,Username,Anonymous ID,Lecture
hpDict = {}
with open("harrypotter.csv") as roster:
    for character in roster:
        unix, name, role, house, patronus = character.strip().split(',')
        hpDict[unix] = {"name" : name, "role" : role, 
                        "house" : house, "patronus": patronus}
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/2869288825.py in <module>
      1 # fname, lname, Yr Status,Username,Anonymous ID,Lecture
      2 hpDict = {}
----> 3 with open("harrypotter.csv") as roster:
      4     for character in roster:
      5         unix, name, role, house, patronus = character.strip().split(',')

FileNotFoundError: [Errno 2] No such file or directory: 'harrypotter.csv'
hpDict
{}
hpDict["ss32"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/3099882066.py in <module>
----> 1 hpDict["ss32"]

KeyError: 'ss32'
sorted(hpDict)[:10]
[]

An Aside: Dictionary, Sets: Hashes

Python stores keys of a dictionary and items in a set as hash values, generated by the hash function. This is why dictionaries are also known as hashtables, especially in other programming languages. Only mutable objects have hash values. You can use the built-in function hash() to explore these values.

myDict = {[3, 4]: "abcd"}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/2743469843.py in <module>
----> 1 myDict = {[3, 4]: "abcd"}

TypeError: unhashable type: 'list'
mySet = {{2, 3}, {3, 4}}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42711/3169937002.py in <module>
----> 1 mySet = {{2, 3}, {3, 4}}

TypeError: unhashable type: 'set'
hash("Williams")
5685612724612356857
hash(("Ephelia", 25))
1219261176729661740