Lab 3: Building a Python Toolbox

Objectives

In this lab you will accomplish two tasks. First, you will construct a toolbox or module of tools for manipulating words and word lists. Then, when finished with your module, you will use your toolbox to answer some trivia questions. In doing this lab, you will gain experience with the following:

  • Using sequences in Python (lists and strings), and associated operators and methods;

  • Writing simple and nested loops;

  • Writing doctests to test your functions; and

  • Creating a module in Python.

Note: You may find it useful to refer to our Strings and Lists Cheat Sheet while working on this lab.

NPR Puzzles

Will Shortz is the puzzle master at National Public Radio. Each Sunday morning he challenges listeners with a puzzle to solve by the following Thursday. Typically these are challenges that test one’s vocabulary, but, as we’ll see, we can frequently compute their solutions.

Here are some interesting problems (in no particular order):

  • P1. (Proposed February 11, 2018.) Name part of the human body in six letters. Add an ‘r’ and rearrange the result to name a part of the body in seven letters. What is it?

  • P2. (Proposed August 16, 2020.) Think of a major city in France whose name is an anagram of a major city in Italy. What cities are they? (Note: An anagram is a word, phrase, or name formed by rearranging the letters of another.)

  • P3. (Proposed September 23, 2018 by Jim Levering of San Antonio) Think of a disease in five letters. Shift each letter three spaces later in the alphabet—for example, ‘a’ would become ‘d’, ‘b’ would become ‘e’, etc. The result will be a prominent name from the Bible. Who is it?

Spelling Bee Puzzles

The Spelling Bee puzzle from the New York Times is also a source of interesting word problems. These words are spelled with an alphabet (called a “hive”) of at most seven letters.

Here are some more interesting problems (again, in no particular order):

  • B1. How many lowercase 7-letter isograms are in the word list 'words/dict.txt'? (Note: An isogram is a word without any repeated letters.)

  • B2. (September 22, 2020.) Suppose you have a seven letter hive, 'mixcent'. How many 4-letter lowercase words in 'words/dict.txt' (1) include 'm' and (2) are spelled only using (possibly repeated) letters from the hive string?

Are you up for solving one or more of these challenges??

Getting Started

Before you begin, clone this week’s repository in the usual manner.

  1. Open the Terminal and cd into your cs134 directory:

    cd cs134
    
  2. Clone your repository from https://evolene.cs.williams.edu with the following command, where you should replace 22xyz3 with your CS username.

    git clone https://evolene.cs.williams.edu/cs134-labs/22xyz3/lab03.git
    
  3. Navigate to your newly created lab03 subdirectory in the Terminal:

    cd lab03
    
  4. Open Atom, go to File menu option, choose Add Project Folder, and navigate to your lab03 directory and click Open. The lab03 starter files will be on the left pane of Atom.

Part 1: wordTools Module

The goal of this week is to build a module of utilities, called wordTools, for manipulating strings and lists of words. Our hope is to help people who wish to solve puzzles like those described above.

We have given you several functions in wordTools.py that may be helpful to you in writing more powerful functions and answering the puzzle questions. As you investigate these functions, think about how they might be used to solve more general problems. In the following steps, you will replace the lines that say pass # TODO:  replace with your code with your code.

  1. Start by reviewing the wordTools.py script, paying careful attention to the docstrings and doctests in the functions. Let’s gain some experience with doctests before writing any code.

    • Notice that when you run wordTools.py as a script, below the if __name__ == '__main__' line our code calls the testmod() function from the doctest module. This method performs all of the interactive examples found in the docstrings of our functions—called doctests—and verifies they produce the correct results. We can use doctests to make sure our functions perform as expected.

    • Currently, one doctest associated with the canon() function fails when you run wordTools.py as a script. The canon() function takes a string word as input and returns a “canonical” version of word which consists of just its letters (without punctuation marks or special characters), in lower case, in alphabetical order. For example, canon('Mama Mia!') is the string 'aaaimmm'.

    • Fix the doctest so that when the script wordTools.py is executed, the canon function passes its tests (you will still get errors about the uniques() and readWords() functions, just ignore those for now). Throughout this semester you will be required to use this testing process to demonstrate that the functions you write are implemented correctly. Note that you should not modify any code in the function body of canon() for this step; you are only modifying the doctest.

  2. Now, let’s extend the toolbox. First, complete the function, uniques(word) that takes as input a string word, and returns a string consisting of the unique characters in word. For example, uniques('abracadabra') should return 'abrcd'. Incorporate two new doctests into the docstring associated with uniques() that test interesting strings.

    (Hint: Use a loop that updates an accumulation variable in uniques().)

  3. Next, complete the function, isIsogram(word), that takes as input a string word and returns True if all of the characters in word are unique, and False otherwise. Case should be ignored. For example, the strings 'Rohit', 'Lida', and 'CS134' are isograms, but 'Jeannie' and 'StEve' are not. Incorporate at least two doctests into the docstring of isIsogram() that test other interesting strings.

    (Hint: Your implementation of isIsogram() should call uniques().)

  4. Your next job is to write a function, sized(n, wordList), that takes as input a word length, n, and a word list, wordList, and returns a list of the words in wordList that are exactly length n. For example:

    >>> sized(3, ['cat', 'dog', 'goat'])
    ['cat', 'dog']
    >>> sized(5, ['frog', 'duck', 'mouse'])
    ['mouse']
    

    Write two new doctests to help verify that your sized() function works as expected.

  5. Note that we have given you a function called readWords(filename) that takes as input the path to a file filename, reads the file, and returns a list of words found one per line in a file whose name is specified by filename. A “word”, like 'New York', may include spaces internally, but not at its ends. You do not need to modify this function, but you may want to use to solve the puzzles. Spend a few minutes reviewing it. You might use this function in the following ways:

    >>> len(readWords('words/firstNames.txt'))
    5166
    >>> readWords('words/bodyParts.txt')[14]
    'belly button'
    >>> sized(8, readWords('words/italianCities.txt'))
    ['Cagliari', 'Florence', 'Siracusa']
    
  6. Finally, review your wordTools toolkit, ensuring it is a solidly built module:

    • Complete the triple-quoted docstring at the top of the file. This helps users understand the purpose of this module. You can check all your documentation with:

      pydoc3 wordTools
      

    Pressing q will exit the pydoc viewer if it does not exit automatically.

    • Make sure that every function is also documented with a helpful docstring.

    • Thoroughly test each function. You might, for example, import the particular function into interactive Python and make sure it works as you expect.

    • Include, in each docstring, at least two doctests (>>>) for each function in wordTools.

Part 2: Solving Puzzles

We’re finally ready to solve some puzzles! We have provided you with a collection of text files containing relevant collections of words in the words folder of your repository that may be useful. (The words/README.txt file describes the contents of these word lists.)

  1. Start by solving spelling-bee puzzle B1 as described at the beginning of this handout. In particular, in the Python script puzzles.py provided in the starter, complete the definition of function b1() that returns the solution to the puzzle.

  2. Next, you may solve either the NPR puzzle P1 or P2 as described above. You must solve at least one of these! If you want extra practice, try solving both. As above, complete the definition of the appropriate function (named after the puzzle) that returns the solution as a string consisting of the pair of answers (in any order) separated by a space. For example, if the solution to P1 is 'stomach' and 'cartilage', the function p1() should return the string 'stomach cartilage' or 'cartilage stomach'.

  3. Extra Credit: If you would like a challenge, check out problems B2 and P3. These are not required! A small amount of extra credit will be given if you solve one or both of them.

    (Hint: You might want to write a helper function (or two) to solve P3.)

Good luck! Do not forget to add, commit, and push your work as it progresses! Test your code often to simplify debugging.

When you are finished, specify collaborators in README.md. Then add, commit, and push all of your work to evolene. This will include the completed wordTools.py and puzzles.py.

Submit You Work

  1. When you are finished with the lab, be sure to add and commit your work.

    git add wordTools.py puzzles.py
    git commit -m "Lab 3 completed"
    

    Then push your work (remembering to start the VPN if you’re working from off campus):

    git push
    
  2. You can, if you wish, check that your work is up-to-date on https://evolene.cs.williams.edu, or with git status in the Terminal window:

    git status
    
  3. Please edit the README.md file and enter the names of any such students on the Collaboration line. Commit and push this change.

  4. Gradesheet.txt gives a breakdown of the rubric you will be graded on for this lab. When graded, this file will contain the feedback as well.

Grading Notes

  1. Your code for the puzzles must compute each answer as directly as possible. In addition, you should make use of the tools imported from your wordTools module whenever possible.

  2. We are looking for solutions that do not use too many for loops or iterate over the word lists more than is necessary. For example, P1 and P2 can be solved using a nested for loop. If you find yourself writing more than 4 loops, it may be best to review your strategy with a TA or an instructor.

  3. Make sure you implement the functions of wordTools carefully. Do not modify function names or interpret parameters differently. Make sure your functions return the results described. This document serves, in some way, as a contract between you and your users. Deviating from this contract makes it hard for potential users to adopt your implementation!

  4. Functionality and programming style are important, just as both the content and the writing style are important when writing an essay. Make sure your variables are named well, and your use of comments, white space, and line breaks promote readability. We expect to see code that makes your logic as clear and easy to follow as possible. The Python Style Guide is available on the course website to help you with stylistic decisions.

  5. As always, the file GradeSheet.txt in your lab03 repository goes over the grading guidelines and documents our expectations.