# Sorting & Dictionaries¶

Today, we will discuss the following:

• Wrap up our discussion on sorting with optional key function

• Discuss a new mutable unordered data structure: a dictionary

## Sorting using `key` function¶

```courses = [('CS134',   90, 'Spring'), ('CS136',   60, 'Spring'),
('AFR206',  30, 'Spring'), ('ECON233', 30, 'Fall'),
('MUS112',  10, 'Fall'),   ('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'),   ('MATH110', 90, 'Spring')]
```
```def capacity(courseTuple):
'''Takes a sequence and returns item at index 1'''
return courseTuple[1]
```
```# can tell sorted() to sort by capacity instead of course name
sorted(courses, key=capacity)
```
```[('MUS112', 10, 'Fall'),
('AFR206', 30, 'Spring'),
('ECON233', 30, 'Fall'),
('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'),
('CS136', 60, 'Spring'),
('CS134', 90, 'Spring'),
('MATH110', 90, 'Spring')]
```
```# sort by capacity in reverse order
sorted(courses, key=capacity, reverse=True)
```
```[('CS134', 90, 'Spring'),
('MATH110', 90, 'Spring'),
('CS136', 60, 'Spring'),
('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'),
('AFR206', 30, 'Spring'),
('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall')]
```

## Stable Sorting¶

Python’s sorting functions are stable, which means that items that are equal according to the sorting key have the same relative order as in the original sequence. To see an example, let us sort the course tuples by the term they are offered by defining a new key function.

```courses = [('CS134', 90, 'Spring'),  ('CS136', 60, 'Spring'),
('AFR206', 30, 'Spring'), ('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall'),   ('STAT200', 50, 'Spring'),
('PSYC201', 50, 'Fall'),  ('MATH110', 90, 'Spring')]
```
```def term(courseTuple):
'''Takes a sequence and returns item at index 2'''
return courseTuple[2]
```
```# sort courses by term
# notice the impact of stable sorting wrt to ties
sorted(courses, key=term)
```
```[('ECON233', 30, 'Fall'),
('MUS112', 10, 'Fall'),
('PSYC201', 50, 'Fall'),
('CS134', 90, 'Spring'),
('CS136', 60, 'Spring'),
('AFR206', 30, 'Spring'),
('STAT200', 50, 'Spring'),
('MATH110', 90, 'Spring')]
```
```# if you want to handle ties differently, can return a tuple in key function
def termAndCap(courseTuple):
return courseTuple[2], courseTuple[1]
```
```sorted(courses, key=termAndCap)
```
```[('MUS112', 10, 'Fall'),
('ECON233', 30, 'Fall'),
('PSYC201', 50, 'Fall'),
('AFR206', 30, 'Spring'),
('STAT200', 50, 'Spring'),
('CS136', 60, 'Spring'),
('CS134', 90, 'Spring'),
('MATH110', 90, 'Spring')]
```

## Sorting sequences based on custom specifications¶

Now suppose we want to sort a list of integers based on their magnitude, i.e., ignoring sign. How can we use the `key` function to achieve that?

```def absoluteValue(num):
"""
Takes a number and returns its absolute value
"""
if num < 0:
return -1 * num
return num

numbers = [-50, 50, -29, 27, 8]

print("Default sorting behavior", sorted(numbers))
print("Sorting on magnitude", sorted(numbers, key=absoluteValue))
```
```Default sorting behavior [-50, -29, 8, 27, 50]
Sorting on magnitude [8, 27, -29, -50, 50]
```

What if we wanted to be able to sort a list containing a mix of characters and numbers? E.g., `[25, 'a', 50, 'b'].`

By default, the `sorted` function will throw an error when given such a list as it would find `string` and `int` values to be incomparable with each other. But we can help make them comparable by using the `ord()` function, so that the sorting uses the ASCII value of characters (with type string).

That is, the sorting behaviors we can describe for sequences (mixed or not) is really only limited by our imagination of what we think makes for a sensible comparison between each element!

```ord('a')
```
```97
```
```chr(97)
```
```'a'
```
```def returnOrdValue(element):
"""
Returns the ASCII value for an element if it is a character,
otherwise assumes that the given element is a number and
returns the number itself.
"""

if type(element) == str:
return ord(element)
return element

mixedList = ['a', 'b', 24, 50, 125]
print("Sorting mixed list", sorted(mixedList, key=returnOrdValue, reverse=True))
```
```Sorting mixed list [125, 'b', 'a', 50, 24]
```

## New Mutable Collection: Dictionaries¶

Dictionaries are unordered collections that map keys to values.

The motivation behind dictionaries is efficient queries: to look for a value associated with a key, we do not need to look through all the keys. We can just access the dictionary using the key as the subscript, and the dictionary returns the corresponding values.

This makes queries a lot more efficient!

```# sample dictionary
zipCodes = {'01267': 'Williamstown', '60606': 'Chicago',
'48202': 'Detroit', '97210': 'Portland'}
```
```# what US city has this zip code?
zipCodes['60606']
```
```'Chicago'
```
```# what US city has this zip code?
zipCodes['48202']
```
```'Detroit'
```
```# if key does not exist
zipCodes['11777']
```
```---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [17], in <cell line: 2>()
1 # if key does not exist
----> 2 zipCodes['11777']

KeyError: '11777'
```
```zipCodes['11777'] = 'Port Jefferson'
```
```zipCodes
```
```{'01267': 'Williamstown',
'60606': 'Chicago',
'48202': 'Detroit',
'97210': 'Portland',
'11777': 'Port Jefferson'}
```
```len(zipCodes)
```
```5
```
```'90210' in zipCodes
```
```False
```
```'01267' in zipCodes
```
```True
```

### Creating Dictionaries¶

Dictionaries can be created in many ways:

• Direct assignment

• Starting with an empty dictionary and accumulating key-value paris

• Using the `dict()` function

```# direct assignment
scrabbleScore = {'a':1 , 'b':3, 'c':3, 'd':2, 'e':1,
'f':4, 'g':2, 'h':4, 'i':1, 'j':8,
'k':5, 'l':1, 'm':3, 'n':1, 'o':1,
'p':3, 'q':10, 'r':1, 's':1, 't':1,
'u':1, 'v':8, 'w':4, 'x':8, 'y':4, 'z': 10}
```
```t = [["1", "2"], ["3","4"]]
```
```dict(t)
```
```{'1': '2', '3': '4'}
```
```# accumulate in a dictionary
verse = "let it be,let it be,let it be,let it be,there will be an answer,let it be"
counts = {} # empty dictionary
for line in verse.split(','):
if line not in counts:
counts[line] = 1 # initialize count
else:
counts[line] += 1 # update count
print(counts)
```
```{'let it be': 5, 'there will be an answer': 1}
```
```# use dict() function
dict([('a', 5), ('b', 7), ('c', 10)])
```
```{'a': 5, 'b': 7, 'c': 10}
```

Important Note: Dictionaries are unordered. While Python usually displays them in the order in which they were defined, there is no inherent order between elements, e.g., we cannot access element at a certain index.

## Example: `frequency`¶

Lets write a function `frequency` that takes as input a list of words `wordList` and returns a dictionary `freqDict` with the unique words in `wordList` as keys, and their number of occurrences in `wordList` as values.

```def frequency(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {} # initialize accumulator as empty dict
for word in wordList:
if word not in freqDict:
freqDict[word] = 1 # add key with count 1
else:
freqDict[word] += 1 # update count
return freqDict
```
```frequency(['a', 'a', 'a', 'c', 'b', 'a', 'd'])
```
```{'a': 4, 'c': 1, 'b': 1, 'd': 1}
```
```verseWords = ['let','it','be','let','it','be','there','will','be','an','answer']
frequency(verseWords)
```
```{'let': 2, 'it': 2, 'be': 3, 'there': 1, 'will': 1, 'an': 1, 'answer': 1}
```
```# read in all words from pride and prejudice
bookWords = []
with open('prideandprejudice.txt') as book:
for line in book:
bookWords.extend(line.strip().split())
```
```bookDict = frequency(bookWords)
```
```# num of unique words? what should we write here
len(bookDict)
```
```6372
```
```# num of times word 'pride' appears?  what should we write?
bookDict['pride']
```
```48
```

## Important Dictionary Method: `.get()`¶

```ids = {'ikh1': 'Iris', 'jra1': 'Jeannie', 'lpd2': 'Lida'}
```
```ids.get('jra1', 'Ephelia')
```
```'Jeannie'
```
```ids.get('xyz1', 'Ephelia')
```
```'Ephelia'
```
```ids # .get does not change the dictionary
```
```{'ikh1': 'Iris', 'jra1': 'Jeannie', 'lpd2': 'Lida'}
```
```print(ids.get('xyz1'))
```
```None
```

## Rewrite `frequency` using `.get()`¶

```def frequencyOld(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {} # initialize accumulator as empty dict
for word in wordList:
if word not in freqDict:
freqDict[word] = 1 # add key with count 1
else:
freqDict[word] += 1 # update count
return freqDict
```

The `.get()` dictionary method allows us to get the value of a key in a dictionary without checking for its existence beforehand. We can optionally specify a default value to return if the key does not exist. It is a more concise way to accomplish the frequency example above.

```def frequency(wordList):
"""Given a list of words, returns a dictionary of word frequencies"""
freqDict = {} # initialize accumulator as empty dict
for word in wordList:
freqDict[word] = freqDict.get(word, 0) + 1
return freqDict
```
```bookDict = frequency(bookWords)
```
```#bookDict
```