Sorting & Dictionaries¶
Today, we will discuss the following:
Wrap up our discussion on sorting with optional key function
Discuss a new mutable unordered data structure: a dictionary
Sorting using key
function¶
courses = [('CS134', 74, 'Spring'), ('CS136', 60, 'Spring'),
('AFR206', 30, 'Spring'), ('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall'), ('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'), ('MATH110', 74, 'Spring')]
def capacity(courseTuple):
'''Takes a sequence and returns item at index 1'''
return courseTuple[1]
# can tell sorted() to sort by capacity instead of course name
sorted(courses, key=capacity)
[('MUS112', 10, 'Fall'),
('AFR206', 30, 'Spring'),
('ECON233', 30, 'Fall'),
('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'),
('CS136', 60, 'Spring'),
('CS134', 74, 'Spring'),
('MATH110', 74, 'Spring')]
# sort by capacity in reverse order
sorted(courses, key=capacity, reverse=True)
[('CS134', 74, 'Spring'),
('MATH110', 74, 'Spring'),
('CS136', 60, 'Spring'),
('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'),
('AFR206', 30, 'Spring'),
('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall')]
Stable Sorting¶
Python’s sorting functions are stable, which means that items that are equal according to the sorting key have the same relative order as in the original sequence. To see an example, let us sort the course tuples by the term they are offered by defining a new key function.
courses = [('CS134', 74, 'Spring'), ('CS136', 60, 'Spring'),
('AFR206', 30, 'Spring'), ('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall'), ('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'), ('MATH110', 74, 'Spring')]
def term(courseTuple):
'''Takes a sequence and returns item at index 2'''
return courseTuple[2]
# sort courses by term
# notice the impact of stable sorting wrt to ties
sorted(courses, key=term)
[('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall'),
('PSYC201', 50, 'Fall'),
('CS134', 74, 'Spring'),
('CS136', 60, 'Spring'),
('AFR206', 30, 'Spring'),
('STAT200', 50, 'Spring'),
('MATH110', 74, 'Spring')]
# if you want to handle ties differently, can return a tuple in key function
def termAndCap(courseTuple):
return courseTuple[2], courseTuple[1]
sorted(courses, key=termAndCap)
[('MUS112', 10, 'Fall'),
('ECON233', 30, 'Fall'),
('PSYC201', 50, 'Fall'),
('AFR206', 30, 'Spring'),
('STAT200', 50, 'Spring'),
('CS136', 60, 'Spring'),
('CS134', 74, 'Spring'),
('MATH110', 74, 'Spring')]
Sorting sequences based on custom specifications¶
Now suppose we want to sort a list of integers based on their magnitude, i.e., ignoring sign. How can we use the key
function to achieve that?
def absoluteValue(num):
"""
Takes a number and returns its absolute value
"""
if num < 0:
return -1*num
return num
numbers = [-50, 50, -29, 27, 8]
print("Default sorting behavior", sorted(numbers))
print("Sorting on magnitude", sorted(numbers, key=absoluteValue))
Default sorting behavior [-50, -29, 8, 27, 50]
Sorting on magnitude [8, 27, -29, -50, 50]
What if we wanted to be able to sort a list containing a mix of characters and numbers? E.g., [25, 'a', 50, 'b'].
By default, the sorted
function will throw an error when given such a list as it would find string
and int
values to be incomparable with each other. But we can help make them comparable by using the ord()
function, so that the sorting uses the ASCII value of characters (with type string).
That is, the sorting behaviors we can describe for sequences (mixed or not) is really only limited by our imagination of what we think makes for a sensible comparison between each element!
ord('a')
97
chr(97)
'a'
def returnOrdValue(element):
"""
Returns the ASCII value for an element if it is a character,
otherwise assumes that the given element is a number and
returns the number itself.
"""
if type(element) == str:
return ord(element)
return element
mixedList = ['a', 'b', 24, 50, 125]
print("Sorting mixed list", sorted(mixedList, key=returnOrdValue))
Sorting mixed list [24, 50, 'a', 'b', 125]
New Mutable Collection: Dictionaries¶
Dictionaries are unordered collections that map keys to values.
The motivation behind dictionaries is efficient queries: to look for a value associated with a key, we do not need to look through all the keys. We can just access the dictionary using the key as the subscript, and the dictionary returns the corresponding values.
This makes queries a lot more efficient!
# sample dictionary
zipCodes = {'01267': 'Williamstown', '60606': 'Chicago',
'48202': 'Detroit', '97210': 'Portland'}
# what US city has this zip code?
zipCodes['60606']
'Chicago'
# what US city has this zip code?
zipCodes['48202']
'Detroit'
# if key does not exist
zipCodes['11777']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42722/1948765118.py in <module>
1 # if key does not exist
----> 2 zipCodes['11777']
KeyError: '11777'
zipCodes['11777'] = 'Port Jefferson'
zipCodes
{'01267': 'Williamstown',
'60606': 'Chicago',
'48202': 'Detroit',
'97210': 'Portland',
'11777': 'Port Jefferson'}
len(zipCodes)
5
'90210' in zipCodes
False
'01267' in zipCodes
True
Creating Dictionaries¶
Dictionaries can be created in many ways:
Direct assignment
Starting with an empty dictionary and accumulating key-value paris
Using the
dict()
function
# direct assignment
scrabbleScore = {'a':1 , 'b':3, 'c':3, 'd':2, 'e':1,
'f':4, 'g':2, 'h':4, 'i':1, 'j':8,
'k':5, 'l':1, 'm':3, 'n':1, 'o':1,
'p':3, 'q':10, 'r':1, 's':1, 't':1,
'u':1, 'v':8, 'w':4, 'x':8, 'y':4, 'z': 10}
t = [["1", "2"], ["3","4"]]
dict(t)
{'1': '2', '3': '4'}
# accumulate in a dictionary
verse = "let it be,let it be,let it be,let it be,there will be an answer,let it be"
counts = {} # empty dictionary
for line in verse.split(','):
if line not in counts:
counts[line] = 1 # initialize count
else:
counts[line] += 1 # update count
counts
{'let it be': 5, 'there will be an answer': 1}
# use dict() function
dict([('a', 5), ('b', 7), ('c', 10)])
{'a': 5, 'b': 7, 'c': 10}
Important Note: Dictionaries are unordered. While Python usually displays them in the order in which they were defined, there is no inherent order between elements, e.g., we cannot access element at a certain index.
Example: frequency
¶
Lets write a function frequency
that takes as input a list of words wordList
and returns a dictionary freqDict
with the unique words in wordList
as keys, and their number of occurrences in wordList
as values.
def frequency(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {} # initialize accumulator as empty dict
for word in wordList:
if word not in freqDict:
freqDict[word] = 1 # add key with count 1
else:
freqDict[word] += 1 # update count
return freqDict
frequency(['a', 'a', 'a', 'c', 'b', 'a', 'd'])
{'a': 4, 'c': 1, 'b': 1, 'd': 1}
verseWords = ['let','it','be','let','it','be','there','will','be','an','answer']
frequency(verseWords)
{'let': 2, 'it': 2, 'be': 3, 'there': 1, 'will': 1, 'an': 1, 'answer': 1}
# read in all words from pride and prejudice
bookWords = []
with open('prideandprejudice.txt') as book:
for line in book:
bookWords.extend(line.strip().split())
bookDict = frequency(bookWords)
# num of unique words? what should we write here
len(bookDict)
6372
# num of times word 'pride' appears? what should we write?
bookDict['pride']
48
Important Dictionary Method: .get()
¶
ids = {'ss32': 'Shikha', 'jra1': 'Jeannie',
'kas10': 'Kelly', 'lpd2': 'Lida'}
ids.get('kas10', 'Ephelia')
'Kelly'
ids.get('srm2', 'Ephelia')
'Ephelia'
ids # .get does not change the dictionary
{'ss32': 'Shikha', 'jra1': 'Jeannie', 'kas10': 'Kelly', 'lpd2': 'Lida'}
print(ids.get('ksl23'))
None
Rewrite frequency
using get
¶
def frequencyOld(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {} # initialize accumulator as empty dict
for word in wordList:
if word not in freqDict:
freqDict[word] = 1 # add key with count 1
else:
freqDict[word] += 1 # update count
return freqDict
def frequency(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {} # initialize accumulator as empty dict
for word in wordList:
# what should we write instead?
freqDict[word] = freqDict.get(word, 0) + 1
return freqDict
bookDict = frequency(bookWords)
#bookDict