Sorting & Dictionaries

Today, we will discuss the following:

  • Wrap up our discussion on sorting with optional key function

  • Discuss a new mutable unordered data structure: a dictionary

Sorting using key function

courses = [('CS134', 74, 'Spring'),  ('CS136', 60, 'Spring'),
           ('AFR206', 30, 'Spring'), ('ECON233', 30, 'Fall'),
           ('MUS112', 10, 'Fall'),   ('STAT200', 50, 'Spring'), 
           ('PSYC201', 50, 'Fall'),  ('MATH110', 74, 'Spring')]
def capacity(courseTuple):
    '''Takes a sequence and returns item at index 1'''
    return courseTuple[1]
# can tell sorted() to sort by capacity instead of course name
sorted(courses, key=capacity)
[('MUS112', 10, 'Fall'),
 ('AFR206', 30, 'Spring'),
 ('ECON233', 30, 'Fall'),
 ('STAT200', 50, 'Spring'),
 ('PSYC201', 50, 'Fall'),
 ('CS136', 60, 'Spring'),
 ('CS134', 74, 'Spring'),
 ('MATH110', 74, 'Spring')]
# sort by capacity in reverse order
sorted(courses, key=capacity, reverse=True)
[('CS134', 74, 'Spring'),
 ('MATH110', 74, 'Spring'),
 ('CS136', 60, 'Spring'),
 ('STAT200', 50, 'Spring'),
 ('PSYC201', 50, 'Fall'),
 ('AFR206', 30, 'Spring'),
 ('ECON233', 30, 'Fall'),
 ('MUS112', 10, 'Fall')]

Stable Sorting

Python’s sorting functions are stable, which means that items that are equal according to the sorting key have the same relative order as in the original sequence. To see an example, let us sort the course tuples by the term they are offered by defining a new key function.

courses = [('CS134', 74, 'Spring'),  ('CS136', 60, 'Spring'),
           ('AFR206', 30, 'Spring'), ('ECON233', 30, 'Fall'),
           ('MUS112', 10, 'Fall'),   ('STAT200', 50, 'Spring'), 
           ('PSYC201', 50, 'Fall'),  ('MATH110', 74, 'Spring')]
def term(courseTuple):
    '''Takes a sequence and returns item at index 2'''
    return courseTuple[2]
# sort courses by term
# notice the impact of stable sorting wrt to ties
sorted(courses, key=term)
[('ECON233', 30, 'Fall'),
 ('MUS112', 10, 'Fall'),
 ('PSYC201', 50, 'Fall'),
 ('CS134', 74, 'Spring'),
 ('CS136', 60, 'Spring'),
 ('AFR206', 30, 'Spring'),
 ('STAT200', 50, 'Spring'),
 ('MATH110', 74, 'Spring')]
# if you want to handle ties differently, can return a tuple in key function
def termAndCap(courseTuple):
    return courseTuple[2], courseTuple[1]
sorted(courses, key=termAndCap)
[('MUS112', 10, 'Fall'),
 ('ECON233', 30, 'Fall'),
 ('PSYC201', 50, 'Fall'),
 ('AFR206', 30, 'Spring'),
 ('STAT200', 50, 'Spring'),
 ('CS136', 60, 'Spring'),
 ('CS134', 74, 'Spring'),
 ('MATH110', 74, 'Spring')]

Sorting sequences based on custom specifications

Now suppose we want to sort a list of integers based on their magnitude, i.e., ignoring sign. How can we use the key function to achieve that?

def absoluteValue(num):
    """
    Takes a number and returns its absolute value
    """
    if num < 0:
        return -1*num
    return num

numbers = [-50, 50, -29, 27, 8]

print("Default sorting behavior", sorted(numbers))
print("Sorting on magnitude", sorted(numbers, key=absoluteValue))
Default sorting behavior [-50, -29, 8, 27, 50]
Sorting on magnitude [8, 27, -29, -50, 50]

What if we wanted to be able to sort a list containing a mix of characters and numbers? E.g., [25, 'a', 50, 'b'].

By default, the sorted function will throw an error when given such a list as it would find string and int values to be incomparable with each other. But we can help make them comparable by using the ord() function, so that the sorting uses the ASCII value of characters (with type string).

That is, the sorting behaviors we can describe for sequences (mixed or not) is really only limited by our imagination of what we think makes for a sensible comparison between each element!

ord('a')
97
chr(97)
'a'
def returnOrdValue(element):
    """
    Returns the ASCII value for an element if it is a character,
    otherwise assumes that the given element is a number and
    returns the number itself.
    """
    
    if type(element) == str:
        return ord(element)
    return element

mixedList = ['a', 'b', 24, 50, 125]
print("Sorting mixed list", sorted(mixedList, key=returnOrdValue))
Sorting mixed list [24, 50, 'a', 'b', 125]

New Mutable Collection: Dictionaries

Dictionaries are unordered collections that map keys to values.

The motivation behind dictionaries is efficient queries: to look for a value associated with a key, we do not need to look through all the keys. We can just access the dictionary using the key as the subscript, and the dictionary returns the corresponding values.

This makes queries a lot more efficient!

# sample dictionary
zipCodes = {'01267': 'Williamstown', '60606': 'Chicago', 
            '48202': 'Detroit', '97210': 'Portland'}
# what US city has this zip code?
zipCodes['60606'] 
'Chicago'
# what US city has this zip code?
zipCodes['48202']
'Detroit'
# if key does not exist
zipCodes['11777']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_42722/1948765118.py in <module>
      1 # if key does not exist
----> 2 zipCodes['11777']

KeyError: '11777'
zipCodes['11777'] = 'Port Jefferson'
zipCodes
{'01267': 'Williamstown',
 '60606': 'Chicago',
 '48202': 'Detroit',
 '97210': 'Portland',
 '11777': 'Port Jefferson'}
len(zipCodes)
5
'90210' in zipCodes
False
'01267' in zipCodes
True

Creating Dictionaries

Dictionaries can be created in many ways:

  • Direct assignment

  • Starting with an empty dictionary and accumulating key-value paris

  • Using the dict() function

# direct assignment
scrabbleScore = {'a':1 , 'b':3, 'c':3, 'd':2, 'e':1, 
                 'f':4, 'g':2, 'h':4, 'i':1, 'j':8, 
                 'k':5, 'l':1, 'm':3, 'n':1, 'o':1, 
                 'p':3, 'q':10, 'r':1, 's':1, 't':1, 
                 'u':1, 'v':8, 'w':4, 'x':8, 'y':4, 'z': 10} 
t = [["1", "2"], ["3","4"]]
dict(t)
{'1': '2', '3': '4'}
# accumulate in a dictionary
verse = "let it be,let it be,let it be,let it be,there will be an answer,let it be"
counts = {} # empty dictionary
for line in verse.split(','):
    if line not in counts:
        counts[line] = 1 # initialize count
    else:
        counts[line] += 1 # update count
counts
{'let it be': 5, 'there will be an answer': 1}
# use dict() function
dict([('a', 5), ('b', 7), ('c', 10)])
{'a': 5, 'b': 7, 'c': 10}

Important Note: Dictionaries are unordered. While Python usually displays them in the order in which they were defined, there is no inherent order between elements, e.g., we cannot access element at a certain index.

Example: frequency

Lets write a function frequency that takes as input a list of words wordList and returns a dictionary freqDict with the unique words in wordList as keys, and their number of occurrences in wordList as values.

def frequency(wordList):
    """Given a list of words, returns a dictionary of word frequencies"""
    freqDict = {} # initialize accumulator as empty dict
    for word in wordList:
        if word not in freqDict:
            freqDict[word] = 1 # add key with count 1
        else:
            freqDict[word] += 1 # update count
    return freqDict
frequency(['a', 'a', 'a', 'c', 'b', 'a', 'd'])
{'a': 4, 'c': 1, 'b': 1, 'd': 1}
verseWords = ['let','it','be','let','it','be','there','will','be','an','answer']
frequency(verseWords)
{'let': 2, 'it': 2, 'be': 3, 'there': 1, 'will': 1, 'an': 1, 'answer': 1}
# read in all words from pride and prejudice
bookWords = []
with open('prideandprejudice.txt') as book:
    for line in book:
        bookWords.extend(line.strip().split())
bookDict = frequency(bookWords)
# num of unique words? what should we write here
len(bookDict)
6372
# num of times word 'pride' appears?  what should we write?
bookDict['pride']
48

Important Dictionary Method: .get()

ids = {'ss32': 'Shikha', 'jra1': 'Jeannie', 
            'kas10': 'Kelly', 'lpd2': 'Lida'}
ids.get('kas10', 'Ephelia')
'Kelly'
ids.get('srm2', 'Ephelia')
'Ephelia'
ids # .get does not change the dictionary
{'ss32': 'Shikha', 'jra1': 'Jeannie', 'kas10': 'Kelly', 'lpd2': 'Lida'}
print(ids.get('ksl23'))
None

Rewrite frequency using get

def frequencyOld(wordList):
    """Given a list of words, returns a dictionary of word frequencies"""
    freqDict = {} # initialize accumulator as empty dict
    for word in wordList:
        if word not in freqDict:
            freqDict[word] = 1 # add key with count 1
        else:
            freqDict[word] += 1 # update count
    return freqDict
def frequency(wordList):
    """Given a list of words, returns a dictionary of word frequencies"""
    freqDict = {} # initialize accumulator as empty dict
    for word in wordList:
        # what should we write instead?
        freqDict[word] = freqDict.get(word, 0) + 1
    return freqDict
bookDict = frequency(bookWords)
 #bookDict