Building a Python Toolbox

In this lab you will accomplish two tasks. First, you will construct a toolbox or module of tools for manipulating words and word lists. Then, when finished with your module, you will use your toolbox to answer some trivia questions. In doing this lab, you will gain experience with the following:

  • Sequences (lists and strings), and associated operators and methods;

  • Writing simple and nested loops;

  • Writing doctests to test your functions;

  • Creating a module and using the __all__ special variable.

Make sure to watch the pre-lab video before starting, which introduces the concept of doctests.

NPR Puzzles

Will Shortz is the puzzle master at National Public Radio. Each Sunday morning he challenges listeners with a puzzle to solve by the following Thursday. Typically these are challenges that test one’s vocabulary, but, as we’ll see, we can frequently compute their solutions.

Here are some interesting problems (in no particular order):

  • P1. (Proposed February 11, 2018.) Name part of the human body in six letters. Add an ‘r’ and rearrange the result to name a part of the body in seven letters. What is it?

  • P2. (Proposed August 16, 2020.) Think of a major city in France whose name is an anagram of a major city in Italy. What cities are they? (Note: An anagram is a word, phrase, or name formed by rearranging the letters of another.)

  • P3. (Proposed September 23, 2018 by Jim Levering of San Antonio) Think of a disease in five letters. Shift each letter three spaces later in the alphabet—for example, ‘a’ would become ‘d’, ‘b’ would become ‘e’, etc. The result will be a prominent name from the Bible. Who is it?

Spelling Bee Puzzles

The Spelling Bee puzzle from the New York Times is also a source of interesting word problems. These words are spelled with an alphabet (called a “hive”) of at most seven letters.

Here are some more interesting problems (again, in no particular order):

  • B1. How many lowercase 7-letter isograms are in the word list 'words/dict.txt'? (Note: An isogram is a word without any repeated letters.)

  • B2. (September 22, 2020.) Suppose you have a seven letter hive, 'mixcent'. How many 4-letter lowercase words in 'words/dict.txt' (1) include 'm' and (2) are spelled only using (possibly repeated) letters from the hive string?

Are you up for solving one or more of these challenges??

Getting Started

Before you begin, clone this week’s repository in the usual manner.

  1. Open the Terminal and cd into your cs134 directory:

     ::bash
     cd cs134
    
  2. Clone your repository from https://evolene.cs.williams.edu:

     :::bash
     git clone https://evolene.cs.williams.edu/cs134-labs/22xyz3/lab03.git
    

    where 22xyz3 is a place holder for your CS username.

  3. Navigate to your newly created lab03 subdirectory in the Terminal:

     :::bash
     cd lab03
    
  4. Open Atom, go to File menu option, choose Add Project Folder, and navigate to your lab03 directory and click Open. The lab03 starter files will be on the left pane of Atom.

Part 1 - wordTools.py

The goal of this week is to build a module of utilities, called wordTools, for manipulating strings and lists of words. Our hope is to help people who wish to solve puzzles like those described above.

We have given you several functions in wordTools.py that may be helpful to you in writing more powerful functions and answering the puzzle questions. As you investigate these functions, think about how they might be used to solve more general problems.

  1. Start by reviewing the wordTools.py script, paying careful attention to the docstrings and doctests in the functions. Let’s gain some experience with doctests before writing any code.

    • Notice that when you run wordTools.py as a script, below the if __name__ == '__main__' line our code calls the testmod() function from the doctest module. This method performs all of the interactive examples found in the docstrings of our functions—called doctests—and verifies they produce the correct results.

    • Currently, one doctest associated with the canon() function fails when you run wordTools.py as a script. The canon() function takes a string word as input and returns a “canonical” version of word which consists of just its letters (without punctuation marks or special characters), in lower case, in alphabetical order. For example, canon('Mama Mia!') is the string 'aaaimmm'.

    • Fix the doctest so that when the script wordTools.py is executed, the canon function passes its tests (you will still get errors about the uniques() function, just ignore those for now). Throughout this semester you will be required to use this testing process to demonstrate that the functions you write are implemented correctly. You should not modify any code in the function body of canon() for this step; you are only modifying the doctest.

  2. Now, let’s extend the toolbox. First, complete the function, uniques(word) that takes as input a string word, and returns a string consisting of the unique characters in word. For example, uniques('abracadabra') should return 'abrcd'. Incorporate two new doctests into the docstring associated with uniques() that test interesting strings.

    (Hint: Use a loop that updates an accumulation variable in uniques().)

  3. Next, complete the function, isIsogram(word), that takes as input a string word and returns True if all the letters in word are unique, and False otherwise. The strings 'Lida' and 'CS134' are isograms, but 'Jeannie', 'ShikHa', and 'KeLlY' are not. Incorporate at least two doctests into the docstring of isIsogram() that test other interesting strings.

    (Hint: Your implementation of isIsogram() should call uniques().)

  4. We have given you a function called readWords(filename) that takes as input the path to a file filename, reads the file, and returns a list of words found one per line in a file whose name is specified by filename. A “word”, like 'New York', may include spaces internally, but not at its ends. You might use this function in the following ways:

     :::python
     >>> len(readWords('words/firstNames.txt'))
     5166
     >>> readWords('words/bodyParts.txt')[14]
     'belly button'
    

    Your job is to write a function, sized(n, wordList), that takes as input a word length, n, and a word list, wordList, and returns a list of the words in wordList that are exactly length n. For example:

     :::python
     >>> sized(8, readWords('words/italianCities.txt'))
     ['Cagliari', 'Florence', 'Siracusa']
     >>> sized(3, ['cat', 'dog', 'mouse'])
     ['cat', 'dog']
    

    Write two new doctests to help verify that your sized() function works as expected.

  5. Finally, review your wordTools toolkit, ensuring it is a solidly built module:

    • Complete the triple-quoted docstring at the top of the file. This helps users understand the purpose of this module. You can check all your documentation with:

        :::bash
        pydoc3 wordTools
      
    • Make sure that every function is documented with a helpful document string.

    • Thoroughly test each function. You might, for example, import the particular function into interactive Python and make sure it works as you expect.

    • Include, in each docstring, at least two doctests (>>>) for each function in wordTools.

    • Double check to make sure the global variable __all__ at the top of the file is a list of strings of function names that should be imported by the following statement:

        :::python
        from wordTools import *
      

Part 2 - Solving Puzzles

We’re finally ready to solve some puzzles! We have provided you a collection of text files containing relevant collections of words in the words folder of your repository that may be useful. (The words/README.txt file describes the contents of these word lists.)

  1. Start by solving spelling-bee puzzle B1 as described at the beginning of this handout. In particular, in the Python script puzzles.py provided in the starter, complete the definition of function b1 that returns the solution to the puzzle.

  2. Next, you may solve either the NPR puzzle P1 or P2 as described above. You must solve at least one of these! If you want extra practice, try solving both. As above, complete the definition of the appropriate function (named after the puzzle) that returns the solution as a string consisting of the pair of answers (in any order) separated by a space. For example, if the solution to P1 is 'stomach' and 'cartilage', the function p1 should return the string 'stomach cartilage' or 'cartilage stomach'.

  3. Extra Credit: If you would like a challenge, check out problems B2 and P3. These are not required! A small amount of extra credit will be given if you solve one or both of them.

    (Hint: You might want to write a helper function (or two) to solve P3.)

Good luck! Do not forget to add, commit, and push your work as it progresses! Test your code often to simplify debugging.

When you are finished, certify that your work is your own by signing the Honor Code statement in the honorcode.txt file. Then add, commit, and push all of your work to evolene. This will include the honorcode.txt, completed wordTools.py, and puzzles.py.

Grading Notes

  1. Your code for the puzzles must compute each answer as directly as possible. In addition, you should make use of the tools imported from your wordTools module whenever possible.

  2. We are looking for solutions that do not use too many for loops or iterate over the word lists more than is necessary. For example, P1 and P2 can be solved using a nested for loop. If you find yourself writing more than 4 loops, it may be best to review your strategy with a TA or an instructor.

  3. Make sure you implement the functions of wordTools carefully. Do not modify function names or interpret parameters differently. Make sure your functions return the results described. This document serves, in some way, as a contract between you and your users. Deviating from this contract makes it hard for potential users to adopt your implementation!

  4. Functionality and programming style are important, just as both the content and the writing style are important when writing an essay. Make sure your variables are named well, and your use of comments, white space, and line breaks promote readability. We expect to see code that makes your logic as clear and easy to follow as possible. A Python Style Guide is available on the course website to help you with stylistic decisions.

  5. As always, the file GradeSheet.txt in your lab03 repository goes over the grading guidelines and documents our expectations.