# Sorting¶

Let’s quickly review binarySearch before moving on to sorting. Recall that binary search works on a sorted list.

```def binarySearch(aList, item):
"""Assume aList is sorted. If item is
in aList, return True; else return False."""

n = len(aList)
mid = n // 2

# base case 1
if n == 0:
return False

# base case 2
elif item == aList[mid]:
return True

# recurse on left
elif item < aList[mid]:
return binarySearch(aList[:mid], item)

# recurse on right
else:
return binarySearch(aList[mid + 1:], item)
```

Although the above approach works, it is actually not O(log n)! The problem is that list splicing is actually an O(n) operation. In order to write a truly logarithmic binary search, we have to recursively pass index values rather than creating list copies using splicing.

```def binarySearchBetter(aList, item, indexStart, indexEnd):
"""Assume aList is sorted. If item is
in aList, return True; else return False."""

n = indexEnd - indexStart
mid = (n // 2) + indexStart

# base case 1
if item == aList[mid]:
return True

# base case 2
elif n <= 0:
return False

# base case 2
elif item == aList[mid]:
return True

# recurse on left
elif item < aList[mid]:
return binarySearchBetter(aList, item, 0, mid)

# recurse on right
else:
return binarySearchBetter(aList, item, mid+1, indexEnd)
```
```# quick test to make sure it works
myList = ['a', 'e', 'i', 'o', 'u', 'z']
print(binarySearch(myList, 'z'))
```
```True
```
```# quick test to make sure it works
myList = ['a', 'e', 'i', 'o', 'u', 'z']
print(binarySearchBetter(myList, 'z', 0, len(myList)-1))
```
```True
```
```# let's make a big list
# we'll include each word twice just to make the list bigger
myList = []
with open("prideandprejudice.txt") as f:
for line in f:
myList.extend(line.strip().split())
myList.extend(line.strip().split())
myList.sort()
print(len(myList))
```
```244178
```
```import time
start_time = time.time()
print(binarySearch(myList, "cat"))
print((time.time() - start_time), "seconds")
```
```False
0.0015799999237060547 seconds
```
```import time
start_time = time.time()
print(binarySearchBetter(myList, "cat", 0, len(myList)-1))
print((time.time() - start_time), "seconds")
```
```False
9.298324584960938e-05 seconds
```

## Selection Sort¶

Binary search is more efficient than linear search, but it also requires that our list be sorted in advance. Sorting is a computationally expensive operation. Today we will explore a few sorting algorithms.

A possible approach to sort:

• Find the smallest element and move it to the first position

• Repeat: find the second-smallest element and move it to the second position, and so on.

This algorithm is called selection sort.

```def selectionSort(myList):
"""Selection sort of given list myList,
mutates list and sorts using selection sort."""
# find size
n = len(myList)

# traverse through all elements
for i in range(n):

# find min element in remaining unsorted list
minIndex = i
for j in range(i + 1, n):
if myList[minIndex] > myList[j]:
minIndex = j

# swap min element with element at i
myList[i], myList[minIndex] = myList[minIndex], myList[i]

```
```myList = [12, 2, 9, 4, 11, 3, 1, 7, 14, 5, 13]
selectionSort(myList)
print(myList)
```
```[1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 14]
```

### Extra Slides Material: MergeSort¶

Mergesort is another way to sort that is more efficient, but also more complicated. To get started, let’s write a helper function, `merge`, that takes two sorted list and iteratively merges them into a single sorted list and returns it.

```def merge(a, b):
"""Merges two sorted lists a and b,
and returns new merged list c"""
# initialize variables
i, j, k = 0, 0, 0
lenA, lenB = len(a), len(b)
c = []

# traverse and populate new list
while i < lenA and j < lenB:

if a[i] <= b[j]:
c.append(a[i])
i += 1
else:
c.append(b[j])
j += 1
k += 1

# handle remaining values
if i < lenA:
c.extend(a[i:])

elif j < lenB:
c.extend(b[j:])

return c
```
```merge([3, 12, 43], [])
```
```[3, 12, 43]
```
```merge([], [0, 2, 12])
```
```[0, 2, 12]
```
```merge(['a', 'd', 'f'], ['b', 'c', 'e'])
```
```['a', 'b', 'c', 'd', 'e', 'f']
```
```evens = [i for i in range(20) if i % 2 == 0]
sqs = [i*i for i in range(1, 8)]
merge(evens, sqs)
```
```[0, 1, 2, 4, 4, 6, 8, 9, 10, 12, 14, 16, 16, 18, 25, 36, 49]
```

Using our helper function, we can implement the recursive `mergeSort` algorithm that uses `merge()` in the final merge step.

```def mergeSort(L):
"""Given a list L, returns
a new list that is L sorted
in ascending order."""
n = len(L)

# base case
if n == 0 or n == 1:
return L

else:
m = n//2 # middle

# recurse on left & right half
sortLt = mergeSort(L[:m])
sortRt = mergeSort(L[m:])

# return merged list
return merge(sortLt, sortRt)
```
```mergeSort([12, 2, 9, 4, 11, 3, 1, 7, 14, 5, 13])
```
```[1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 14]
```
```mergeSort(['hello', 'world', 'aloha', 'earth'])
```
```['aloha', 'earth', 'hello', 'world']
```
```mergeSort(['e', 'p', 'o', 'c', 'h'])
```
```['c', 'e', 'h', 'o', 'p']
```
```mergeSort(list('We hate Covid-19'))
```
```[' ',
' ',
'-',
'1',
'9',
'C',
'W',
'a',
'd',
'e',
'e',
'h',
'i',
'o',
't',
'v']
```

### Merge Sort vs Selection Sort¶

Why do we need a better sorting algorithm? As the list we are sorting grows large, the Big-O bound matters! Let’s compare the runtime of both sorting algorithms on pretty large lists.

```wordList = []
with open('prideandprejudice.txt') as book:
for line in book:
line = line.strip().split()
wordList.extend(line)
print(len(wordList))
```
```122089
```
```miniList = wordList[:500]
medList = wordList[:7000]
```
```import time

def timedSorting(wordList):
"""Measures runtime for sorting wordList"""
start = time.time()
sortedWordList = selectionSort(wordList)
end = time.time()
print("Selection sort takes {} secs".format(end - start))
start = time.time()
sortedWordList = mergeSort(wordList)
end = time.time()
print("Merge sort takes {} secs".format(end - start))
```
```timedSorting(miniList)
```
```Selection sort takes 0.005836009979248047 secs
Merge sort takes 0.0006148815155029297 secs
```
```#timedSorting(medList)
```
```#timedSorting(wordList)
```