Lab 3

Overview

In this lab, we will build our skills with Python sequences by searching, parsing, and transforming text. In particular, this lab will further our experience with the Python str data type (str), the Python list data type (list), and with using the for-each loop pattern to iterate through data stored in sequences.

We will start with a few seemingly unrelated functions, and we will combine them to complete a concrete task: at the end of this lab, we will have written a small program to solve Madlib-style puzzles, which you can create and solve on your own.

Pre-Lab Function

Your pre-lab task is to write a function that takes two str arguments: char, which should be a str of length 1 (i.e., char should be a single character), and string, which can be any str value. For example, char could be the str "r", and string could be the str "Hello world!". Given these arguments, your function should return a new str defined as follows:

Below is the result of calling all_text_before(char, string) on a few different inputs inside an interactive Python session. Notice that str values are being displayed; this is because the function returns those values. Your function should not print the value it computes (in fact, it should not print anything!). Instead, it should return a str value.

>>> all_text_before("r", "Hello World!")
'Hello Wo'
>>> all_text_before(" ", "Hello World!")
'Hello'
>>> all_text_before("", "Hello World!")
'Hello World!'
>>> all_text_before("World", "Hello World!")
'Hello World!'
>>> all_text_before("H", "Hello World!")
''

Later in this lab, you will write a related function, all_text_after(char, string) as a building block for solving a Madlibs-style puzzle (details below!), so taking the time to think about this pre-lab function will pay dividends later.

Getting Started in Lab

  1. Open the Terminal and go to your cs134 directory using the cd (change directory) command you used during Lab 1. (If you are working at a new or different computer, you may want to create a new cs134 directory if one does not already exist with the mkdir command.)

  2. Now we must retrieve the files we need to complete this week’s lab. Navigate to https://evolene.cs.williams.edu in your browser and log in using your CS credentials. Under Projects, you should see a repository named cs134-labs/23xyz3/lab03 (where 23xyz3 is your CS username). This repository contains your starter files for this week’s lab.

  3. Clone the repository: find the blue button that is a drop-down menu that says Clone. Click on it and click on the “clipboard” icon (Copy URL) option next to the option to Clone with HTTPS.

    Return to the Terminal and type git clone followed by the URL of the lab project you just copied (you can paste on Windows by pressing Ctrl-V and on a Mac by pressing Command-V). This should look like the following:

    git clone https://evolene.cs.williams.edu/cs134-labs/23xyz3/lab03.git

    (where 23xyz3 is again a place-holder for your CS username.)

  4. Navigate to your newly created lab03 folder in the Terminal:

    cd lab03
  5. Explore the contents of the directory using the ls command in the Terminal.

  6. Open VS Code. Then go to File menu option and choose Open Folder. Navigate to your lab03 directory and click Open. You should see the starter files of today’s lab, including madlibs.py, on the left pane of VS Code. All of the code that you write this week (other than tests) will be in madlibs.py.

Rules and Guidelines

This lab is designed to be completed using only material we have discussed so far in class. You will need to use for loops to iterate through sequences, + to update “accumulator variables”, == to test the equality of two expressions, and if/elif/else to execute code when certain conditions hold.

Although not strictly necessary, you may also benefit from other Python language features, including sequence “slicing”, using the len() function to calculate a sequence’s length, using range() to generate a sequence of integers, or accessing specific sequence elements using square brackets ([]).

You SHOULD NOT use any language features or concepts that we have not yet discussed in lectures, even if they may seem like reasonable tools for parts of this assignment. We have intentionally chosen to focus on algorithmic thinking instead of Python-specific ways of doing things, and we will explore these so-called “Pythonisms” later. In particular, do NOT use any “string methods” or “list methods” (and if you don’t know what those are yet, that is the point of this note!).

Decomposing the Problem

A key skill for an algorithmic thinker is the ability to break a complex task into a set of smaller problems, that when solved individually, help solve the larger problem. We’ve referred to this as problem decomposition, and we’ve done a lot of that decomposition for you. The next steps of this lab will introduce those building blocks to you in an order that allows you to develop and test your code incrementally. Then, we will discuss the format that we will be using to represent our Madlibs puzzles, and we will strategize ways to use our building blocks to print the text of completed Madlibs puzzles.

Building Block: Identifying Prefixes

A sequence pre is said to be a prefix of another sequence seq if each successive element in pre appears in the same order at the beginning of seq. For example, here are all prefixes of word:

Since a Python str is an ordered sequence of individual characters, we can determine that one str is a prefix of another by comparing the characters in those strs, in order, and confirming that they match.

You should implement this functionality in the Python function is_prefix(prefix, string), which takes two strs as input: prefix and string. Your function should return True if prefix is a prefix of string, and False otherwise. Below is the result of calling is_prefix() on a few inputs inside an interactive Python session. Notice that the values True and False are being displayed; this is because the function returns those values. Your function should not print “True” or “False” (in fact, it should not print anything!). Instead, it should return a bool value.

>>> is_prefix("pre", "prefix")
True
>>> is_prefix("abc", "prefix")
False
>>> is_prefix("", "prefix")
True
>>> is_prefix("prefix", "prefix")
True
>>> is_prefix("prefixing", "prefix")
False

At first glance, it may seem like we need to iterate through the two sequences in unison. There is no way to do this in a single for-each loop without additional Python tricks, so for now, we’ll need to think of another strategy. Luckily, we are equipped with the tools to tackle this problem in several different ways!

One strategy is to, instead of iterating through either string or prefix, iterate through something else. The trick is to choose a “something else” that lets us refer to the appropriate elements from our two actual sequences so we can compare them. An integer lets us index into a sequence using the “square bracket” notation (e.g., string[0]), so constructing the right integer sequence to serve as our indices would allow us to compare the corresponding elements of string and prefix to each other. Python’s range() may help!

Another observation that could help is that, when given two strs str_a and str_b, the boolean expression str_a == str_b evaluates to True if the characters in the strs str_a and str_b are an exact match. We may be able to construct appropriate subsequence of string and prefix that we can use for comparison.

Testing Your Program. To test your is_prefix() implementation, you can use the runtests.py program. Since there are multiple functions we will be testing in this program, we’ve added the ability to specify which function’s tests to run. To run the is_prefix() tests, you should give the additional argument “pre”:

% python3 runtests.py pre

Adding Your Own Tests. When writing our code, it’s always a good idea to consider different edge cases and to test that our code handles them appropriately. Under the # YOUR EXTRA TESTS section in the runtests.py program, we’ve placed some functions for you to complete with your own tests: my_is_prefix_test(). Make sure to call your test function where the rest of the testing functions are called!

Building Block: Identifying Suffixes

A related-but-slightly-trickier problem to the “prefix identification” problem above is the task of identifying suffixes.

A sequence suf is said to be a suffix of another sequence seq if each successive element in suf appears in the same order at the ending of seq. For example, here are all suffixes of word:

You should implement the functionality to identify whether a str is a suffix of another inside is_suffix(suffix, string). Like is_prefix(), this function will also take two strs as input: suffix and string. It should return True if suffix is a suffix of string, and False otherwise. Below are some sample invocations from within an interactive Python session:

>>> is_suffix("fix", "suffix")
True
>>> is_suffix("abc", "suffix")
False
>>> is_suffix("", "suffix")
True
>>> is_suffix("suffix", "suffix")
True
>>> is_suffix("fixing", "suffix")
False

Why did we suggest that this task might be “trickier” than identifying a prefix? When we iterate through a sequence using the for-each pattern, our iteration always starts at the sequence’s first element. For the prefix version of this problem, the first elements of string and prefix both correspond to the characters that we want to compare. So do the second elements, the third elements, and so on. However, when identifying most suffixes, we’ll want to start somewhere in the middle of string—not at its beginning.

One way to approach this problem is to structure our for-each loop so that it iterates over something other than the elements of string. Suppose we instead iterate over a sequence of integers. As long as we can convert those integers into the elements of our strs that we care about, we can compare them appropriately. The question, then, is what sequence of integers, and how do we convert those integers into the desired indices within string and suffix?

Another option is modify string so that when we do iterate through its elements, we are only iterating through elements that we care about. “Slicing” string may let us “start in the middle” and treat this problem similarly to prefix identification. In fact, we may even be able to reuse the work we did in that function!

Like many problems in computer science, there are multiple correct ways to attack it (including strategies not described here). You should choose the approach that matches your thought process—it will be easier to code and debug that way!

Testing Your Program. To test your is_suffix() implementation, you can use the runtests.py program with the “suf” argument:

% python3 runtests.py suf

Adding Your Own Tests. When writing our code, it’s always a good idea to consider different edge cases and to test that our code handles them appropriately. Under the # YOUR EXTRA TESTS section in the runtests.py program, we’ve placed some functions for you to complete with your own tests: my_is_suffix_test(). Make sure to invoke your test function where the rest of the testing functions are invoked!

Building Block: Extracting Suffixes

Your third lab task is to write a function called all_text_after(char, string) that takes two str arguments: char, which should be a str of length 1 (i.e., char should always be a single character), and string, which can be any str value. For example, char could be the str "r", and string could be the str "Hello world!". Given these arguments, your function should return a new string defined as follows:

Below is the result of calling all_text_after(char, string) on a few different inputs inside an interactive Python session. Notice that str values are being displayed; this is because the function returns those values. Your function should not print the value it computes (in fact, it should not print anything!). Instead, it should return a str value.

>>> all_text_after("r", "Hello World!")
'ld'
>>> all_text_after(" ", "Hello World!")
'World!'
>>> all_text_after("", "Hello World!")
''
>>> all_text_after("World", "Hello World!")
''
>>> all_text_after("!", "Hello World!")
''
>>> all_text_after("H", "Hello World!")
'ello World!'

The idea for this function is very similar to all_text_before(), but instead of stopping our accumulation of characters when we find the first occurrence of char, we want to start accumulating characters after the first occurrence of char. We may pass through several iterations of our for loop before identifying a match, so it may be helpful to create a bool variable that changes its value once the first match is found. You can use an if inside your for loop to selectively execute the “accumulation” of characters after your condition is met.

Testing Your Program. To test your all_text_after() implementation, you can use the runtests.py program with the “after” argument:

% python3 runtests.py after

Adding Your Own Tests. When writing our code, it’s always a good idea to consider different edge cases and to test that our code handles them appropriately. Under the # YOUR EXTRA TESTS section in the runtests.py program, we’ve placed some functions for you to complete with your own tests: my_all_text_after_test(). Make sure to call your test function in the appropriate place!

Madlibs puzzles

At this point we’ve developed several “utility functions” that will serve as useful building blocks when writing our Madlibs puzzle game. But what exactly is a “Madlibs puzzle”, and how is it represented as data in our program?

We’ve broken each Madlibs Puzzle into two files:

  1. A Madlibs puzzle story (a puzzle story is a text file that ends in the extension .story)
  2. A Madlibs puzzle answer key (an answer key file is a text file that ends in the extension .answerkey)

Puzzle story files contain text interspersed with one or more “placeholders”. Placeholders are all encoded as single words that start with a less-than symbol (<), end with a greater-than symbol (>), and have one or more alphanumeric characters between those symbols (non-punctuation, non-whitespace characters). Typically, the text between the symbols is a descriptor and a number. For example, <adjective1> and <noun4> are two placeholders in the provided stories. These placeholders will eventually be substituted with words that match the placeholder’s description, completing the story.

As an example, an excerpt from a story file named nursery_rhyme.story might contain the following text:

Mary had a/an <adjective1> lamb. Its <noun1> was <adjective2> as <noun2>.

If we replaced “<adjective1>” with “little”, “<noun1>” with “fleece”, “<adjective2>” with “white”, and “<noun2>” with “snow”, we’d have a (possibly) familiar excerpt from a nursery rhyme:

| Mary had a/an little lamb. Its fleece was white as snow.

On the other hand, if we replaced “<adjective1>” with “enormous”, “<noun1>” with “mousepad”, “<adjective2>” with “loud”, and “<noun2>” with “rhombus”, we’d have something that is grammatically correct, but rather silly.

| Mary had a/an enormous lamb. Its mousepad was loud as rhombus.

Puzzle answer key files are lists of “swap-key=swap-value” pairs that describe the way that the placeholders from a story file will be updated. Each “swap-key=swap-value” pair consists of a placeholder (as described above), an equals sign (=), and text that will be substituted into the puzzle as a replacement for all occurrences of the placeholder. For example, a puzzle answer key file for the substitution above would include the lines:

<adjective1>=enormous
<noun1>=mousepad
<adjective2>=loud
<noun2>=rhombus

We called these “swap-key=swap-value” pairs because when we see an occurrence of a “swap-key” in our puzzle story file, we will look into our puzzle answer key file to find the matching “swap-value” that should replace it.

The game. Hopefully the above example illustrates that a Madlib puzzle is, in some sense, a game. Outside of your program, you can play this game by filling in the contents of a puzzle answer key file. To have the most fun, you should avoid reading the puzzle files (files that end in .story). Instead, open up a puzzle answer key file (a file that ends in .answerkey) inside VSCode, and fill in the puzzle answer key with words that you would like to use as substitutions for the placeholders. The placeholder should describe the qualities of a word that will fit into the story—grammatically and thematically—but without knowing the context of the puzzle, the words that you choose will often lead to humorous substitutions.

The rest of the lab will walk us through the steps to finish a program that takes a puzzle story file and a completed puzzle answer key file, and outputs the (hopefully humorous) version of the story that includes the substitutions.

The Madlibs Algorithm

Now is the point in the lab where we start to put all the pieces together. We’ll first present the algorithm, and then complete one last helper function before implementing the algorithm itself—problem decomposition is a continuous process!

So, at a high level, what is it that we need to do? We need to read in a puzzle’s story and answer key, then go through the story and substitute each placeholder with its matching swap-value from the puzzle’s answer key. Let’s break this down into its own function that we can use as our last building block.

Identifying Swap-Values

Every time we identify a placeholder within our puzzle story, we need to find the corresponding swap-value from within the puzzle answer key. For example, if we see "<adjective1>" in a story, we need to look through the puzzle answer key to find the str that starts with "<adjective1>" (a swap-key), and then extract the matching swap-value. We will implement this behavior inside the function get_madlibs_replacement(placeholder, puzzle_key_list).

This function has two parameters: placeholder, which is a swap-key such as "<adjective1>", and puzzle_key_list, which is a list of “swap-key=swap-value” pairs that we read from a puzzle answer key file.

Luckily, we’ve already written the hardest parts of this function. Since placeholder is the swap-key in one of the “swap-key=swap-value” pairs found in puzzle_key_list, we just need to iterate through the elements in puzzle_key_list to find it. Once we’ve found a match, the swap-value is all of the characters that appear after the "=". (If we don’t find a match, just return placeholder since no subsitution is possible.)

We can use our is_prefix() function to identify the matching swap-key, and the all_text_after() function to extract the swap-value. Finally, we return that suffix.

Testing Your Program. To test your get_madlibs_replacement() implementation, you can use the runtests.py program with the “replace” argument:

% python3 runtests.py replace

Solving the Puzzle

The last step is to write solved_madlibs(puzzle_story_list, puzzle_key_list), which implements the Madlibs algorithm and returns a str containing the completed story. As input, the function takes two parameters:

The function should return a str.

Given these two lists, you should do the following:

  1. Iterate through the elements of madlibs_puzzle (the contents of the story), and every time you encounter a str that is not a placeholder, add it to your accumulator variable (that is a list of strs) that stores every str in the solved story. Every time you encounter a str that is a placeholder, use your get_madlibs_replacement() function to look up the matching swap-value from puzzle_key_list, and add that swap-value to your accumulator variable.
  2. After you’ve iterated through your story and completed your swaps, your accumulator variable is a list that contains all the strs that are part of your completed puzzle. Convert the contents of this list into a str using the provided format_madlib(solved_puzzle) function that we’ve implemented for you in text_utils.py. The argument to this function should be your accumulator variable. Your function should return the resulting str.

Formatting Our Madlibs. We have provided a handy function, format_madlib(solved_puzzle), which we imported from text_utils at the top of the starter code. format_madlib(solved_puzzle) takes a list of strs (e.g., the text of a Madlibs puzzle—solved or not) and prints it to the screen with nice formatting, as shown below:

>>> story_list = ['Uh', '-', 'oh', ',', 'I', 'forgot', 'to', '<verb1>', 'for', 'the', '<schoolsubject1>', 'exam', '!']
>>> format_madlib(story_list)
'Uh-oh, I forgot to <verb1> for my <schoolsubject1> exam!'

Testing Your Program. To test your solved_madlibs() implementation, you can use the runtests.py program with the “solve” argument:

% python3 runtests.py solve

The Grand Finale

At this point in the lab, you’ve written all of the functions you need to complete a Madlibs puzzle game. However, until we invoke those functions, we aren’t actually done. At the bottom of your program, there is a section that resembles the lines:

if __name__ == "__main__":
    # comments ...
    # more comments ...
    pass

The section of your program is a standard part of many .py files. It is a conditional (an if statement followed by a boolean expression, followed by an indented section of code), which means that the indented region of code in its body will only be executed if the condition evaluates to True. But what does the expression __name__ == "__main__" mean, and in what circumstances is it True or False?

We will explore this syntax in more detail later this semester, but the important things to know are this:

This is a very helpful feature. It gives us a place where we can put code in our programs that only executes when we “run” the program. This is often where we’ll write tests or where we’ll include code that reads arguments from the terminal and executes our functions using those arguments.

Now, make it so that running your program actually plays your game! You may wish to modify your program to resemble the following:

if __name__ == "__main__":
    # Ask the user which Madlibs puzzle they want us to solve
    story_filename = input("What puzzle story file would you like to use? ")
    key_filename = input("What puzzle answer key file would you like to use? ")
    
    
    # Read the contents of the puzzle files into variables we can use
    story_list = read_stringlist_from_file(story_filename)
    key_list = read_stringlist_from_file(key_filename)
    
    print("Here is your completed Malib:")
    print("")
    
    # Solve and print the puzzle's solution
    print(solved_madlibs(story_list, key_list))

Reading in Files. To implement our Madlibs game, we need a way to store the contents of puzzle story files and puzzle answer key files in variables that our program can manipulate. Since our primary focus is on building our skills with strs and lists, we have implemented this file-reading functionality for you, and we import the necessary modules at the top of the starter code.

One of these two lines at the top of the starter code allows you to invoke the function read_stringlist_from_file(filename) from inside your code. The function read_stringlist_from_file(filename) takes the location of a file as input (the filename parameter’s type is str), and it returns a list of all of the words and punctuation in that file. So for example, if I had a file named "sample.story" with the contents:

Uh-oh, I forgot to <verb1> for the <schoolsubject1> exam!

and a file named "sample.answerkey" with the contents:

<verb1>=study
<schoolsubject1>=CS134

Then the following interactive Python session output displays its behavior:

>>> read_stringlist_from_file("sample.story")
['Uh', '-', 'oh', ',', 'I', 'forgot', 'to', '<verb1>', 'for', 'the', '<schoolsubject1>', 'exam', '!']
>>> read_stringlist_from_file("sample.answerkey")
['<verb1>=study', '<schoolsubject1>=CS134']

Running Your Program. Finally, to run your completed program, you can type the following command in the Terminal:

% python3 madlibs.py

Submit Your Work

  1. When you are finished adding your functions to the script madlibs.py, make sure you add and commit your work. In your Terminal, type:

    git add madlibs.py runtests.py 
    git commit -m "Lab 3 completed"

    Then you can the push your work (remembering to start the VPN if you’re working off-campus):

    git push
  2. You can, if you wish, check that your work is up-to-date on https://evolene.cs.williams.edu. Another way to check that you have committed and pushed all your changes is through the Terminal. In the Terminal in your lab03 directory, type git status:

    git status

    It should show your changes (probably in green) that have not been committed, and files (probably in red, if any), that have not been added. If you have successfully committed and pushed all your work, it should say so.

  3. Please edit the README.md file and enter the names of any appropriate students on the Collaboration line. Add, commit, and push this change.

  4. Near the bottom of the README.md, there is a breakdown of the grading expectations that will form the basis of your lab’s evaluation. Please keep these in mind as you work through your lab!

  5. Download a .zip archive of your work. Download your assignment files for submission by going to your lab repository on Gitlab, selecting the Download source code icon (a down arrow), and select zip. Doing so should download all of your lab files into a single zip archive as lab03-main.zip, and place it inside your Downloads folder (or to whichever folder is set as your browser’s default download location).

  6. Submit your work. Navigate to the CS134 course on Gradescope. On your Dashboard, select the appropriate Lab Assignment. Drag and Drop your downloaded zip archive of the Lab Assignment from the previous step, and select ‘Upload’.