Lab 10

Sorting and Searching

** Note: There is no Pre-Lab for this lab. **

** Note: This is Not a Partner Lab. Please submit your own lab to Gradescope.**

Objectives

In this lab you will gain experience with classic searching and sorting algorithms and compare how their execution times (as measured by the “wall-clock” running time) scale with growing input sizes. Note that this is different than the definition of efficiency we’ve focused on in class; we argued that efficiency should be measured by counting the number of operations that an algorithm takes as a function of its inputs, in the worst case. Hopefully, this lab will show us that our in-class definition of efficiency does actually capture something useful about algorithm running times!

Before you start this lab, review the binary search and selection sort algorithms from lecture.

Getting Started

Clone the lab resources from the GitLab repository, as usual:

cd cs134
git clone https://evolene.cs.williams.edu/cs134-labs/23xyz3/lab10.git

where your CS username replaces 23xyz3. The description of the files provided in the starter is available in README.md and discussed below.

We have provided you with four Python scripts: search.py, sort.py, compare.py and plot.py. These contain completed implementations of some algorithms (linear_search, binary_search_recursive and merge_sort), as well functions to read data and plot running times of various sorting and searching algorithms. You should read through these instructions first and review this code to understand how it works.

Our two main tasks are to:

fix the broken/incomplete implementations of binary_search_iterative and selection_sort.
plot the wall-clock running times of each of the algorithms (both the provided “correct” algorithms and your newly fixed algorithms) and reflect.

Each of these tasks are described in detail below.

Collaboration

Each student must complete and submit their own lab solution by committing and pushing to their own individual Git repository and Gradescope. However, you are allowed to discuss your code, your debugging strategies, and your findings with a labmate during your scheduled lab period. Additional guidelines for collaboration:

You may actively work on the same lab problem together: that is, you may discuss at a high-level your process with a partner who is simultaneously working on the same question that you are. However, you may not seek help from any student who has already completed a question. In other words, you may work collaboratively on the solving of a question, but you may not use or benefit from another student’s completed work.
You may discuss the code for any of the implementations of the instructor-provided functions that we have supplied, debugging strategies, python syntax, and error messages without restriction.

Q1. Iterative Binary Search

In lecture, we discussed a recursive implementation of the binary search algorithm. This implementation is included in search.py. Your task is to translate this recursive implementation to an iterative one. That is, complete binary_search_iterative in search.py that uses a while loop to perform binary search. The outline of the iterative solution has been provided with five TODO#s highlighted for you in comments.

After you complete the TODOs in binary_search_iterative,

test your implementation by running the provided testing code in the if __name__ == "__main__" block, and
add at least one additional, meaningful test to the if __name__ == "__main__" block that tests behavior that is not already exercised by the provided test.

Of course, Q1 is not complete until your implementation passes all the tests!

Q2. Debug Selection Sort

In the file sort.py we have included a buggy implementation of the selection sort algorithm discussed in lecture. The function selection_sort takes as input a list, my_lst, and tries to sort it in-place by mutating the list (in other words, the value of my_list is changed rather than returning a new copy of my_list where the elements are in sorted order as is done by the sorted function). The function selection_sort is attempting to use the selection sort algorithm, which is (correctly) described below:

For each index i from 0 to n-1 (where n is the length of the list), do the following:

Find the smallest element between indices i and n-1 in the list (choosing exactly one of the smallest elements if there is a tie)
Place this min element in the ith position by swapping it with the current element at index i.

Thus, selection sort finds the smallest element in the entire list and swaps the item at index 0 with this item, then finds the second-smallest element (which is the smallest in the new list my_lst[1:]) and swaps it with the item at index 1 and so on.

The function selection_sort in sort.py does not correctly sort the input. You may verify this by running the file as a script and seeing that the provided function merge_sort sorts correctly, but selection sort does not. The bugs in the selection sort implementation are in the implementation of the helper function, get_min_index.

Your task is to fix the helper function and make sure selection_sort is working correctly.

Helpful hints. You may want to add some print statements to figure out what is going wrong. Unlike question 1, we have not highlighted the number of steps or position of bugs.

Q3. Plotting wall-clock runtimes

In class we discussed the Big-O complexity of the following search and sort algorithms:

linear search: O(n)
binary search: O(log n)
selection sort: O(n^2)
merge sort: O(n log n)

While the Big-O complexity is a pessimistic and approximate estimate of the total number of steps, it is a good predictor of the scaling of actual runtimes in practice. To see this, we will plot the wall-clock time taken by the above search and sort algorithms for varying values of n, and we will verify that the predicted Big-O trends hold true.

To keep track of the the number of seconds these functions take to search/sort and return their output, we will use Python’s time module. The code that calls and plots the runtimes is already provided for you in plot.py.

Your task is to call the plotting functions in the if __name__ == "__main__": block in the file compare.py. In particular, you must do the following:

Call the plot_search_times function from the plot module, providing arguments num_lst and item. This will generate a plot of the runtime comparison of three search algorithms: linear_search, binary_search and Python’s built-in in operator.
Observe the generated plot in plot_search.png and whether the trends match the Big-O predictions. You may now answer TODO4 in compare.py: What does the plot indicate about the complexity of Python’s built-in in operator?
Call the read_data function in compare.py to read-in all the text from the book prideandprejudice.txt. We will sort this list of strings using three different sorting algorithms: merge_sort, selection_sort and Python’s built-in .sort() method for lists. To compare their runtimes, call the plot_sort_times function on the word list returned by read_data.
Observe the generated plot in plot_sort.png and whether the Big-O complexity trends match with the observed runtimes. You may now answer TODO5 in compare.py: What does the plot indicate about the complexity of Python’s built-in List.sort() method?

Do not introduce syntax errors when answering TODO 4 and 5. Be sure your code still runs before submitting!

Submitting your work

Note: Ensure that the image in plot_search.png shows the plot to compare search algorithm efficiencies, and the image in plot_sort.png shows the plot for sorting algorithms, and commit them so that we may grade it. Both of these images should be generated when you successfully complete and run compare.py.

Do not modify function names or image file names or interpret parameters differently from what is specified! Make sure your functions follow the expected behavior in terms of type of input and output: if they return lists, their default return type must always be list. A function’s documentation serves, in some way, as a contract between you and your users. Deviating from this contract makes it hard for potential users to adopt your implementation!
Functionality and programming style are important, just as both the content and the writing style are important when writing an essay. Make sure your variables are named well, and your use of comments, white space, and line breaks promote readability. We expect to see code that makes your logic as clear and easy to follow as possible.
Do not forget to add, commit, and push your work as it progresses! Test your code often to simplify debugging.
Please edit the README.md file and enter the names of any appropriate students on the Collaboration line. Add, commit, and push this change.
Near the bottom of the README.md, there is a breakdown of the grading expectations that will form the basis of your lab’s evaluation. Please keep these in mind as you work through your lab!
Download a .zip archive of your work. Download your assignment files for submission by going to your lab repository on Gitlab, selecting the Download source code icon (a down arrow), and select zip. Doing so should download all of your lab files into a single zip archive as lab10-main.zip, and place it inside your Downloads folder (or to whichever folder is set as your browser’s default download location).
Submit your work. Navigate to the CS134 course on Gradescope. On your Dashboard, select the appropriate Lab Assignment. Drag and Drop your downloaded zip archive of the Lab Assignment from the previous step, and select ‘Upload’.

Good luck as you head into finals week!