In this lab, we will build our skills with Python sequences by
searching, parsing, and transforming text. In particular, this lab will
further our experience with the Python str data type (str),
the Python list data type (list), and with using the
for-each loop pattern to iterate through data stored in sequences.
We will start with a few seemingly unrelated functions, and we will combine them to complete a concrete task: at the end of this lab, we will have written a small program to solve Madlib-style puzzles, which you can create and solve on your own.
Your pre-lab task is to write a function that takes two str
arguments: char, which should be a str of length 1 (i.e.,
char should be a single character), and
string, which can be any str value. For example,
char could be the str "r", and
string could be the str "Hello world!". Given
these arguments, your function should return a new str defined as
follows:
char appears inside the sequence of characters in
string, then your function should return all of the
characters in string that appear before the first
occurrence of char.char is not a single character or if
char does not appear inside the sequence of characters in
string, then your program should return all of the
characters in stringBelow is the result of calling
all_text_before(char, string) on a few different inputs
inside an interactive Python session. Notice that str values are being
displayed; this is because the function returns those
values. Your function should not print the value it computes (in fact,
it should not print anything!). Instead, it should
return a str value.
>>> all_text_before("r", "Hello World!")
'Hello Wo'
>>> all_text_before(" ", "Hello World!")
'Hello'
>>> all_text_before("", "Hello World!")
'Hello World!'
>>> all_text_before("World", "Hello World!")
'Hello World!'
>>> all_text_before("H", "Hello World!")
''Later in this lab, you will write a related function,
all_text_after(char, string) as a building block for
solving a Madlibs-style puzzle (details below!), so taking the time to
think about this pre-lab function will pay dividends later.
Open the Terminal and go to your cs134 directory
using the cd (change directory) command you used during Lab
1. (If you are working at a new or different computer, you may want to
create a new cs134 directory if one does not already exist
with the mkdir command.)
Now we must retrieve the files we need to complete this week’s
lab. Navigate to https://evolene.cs.williams.edu
in your browser and log in using your CS credentials. Under
Projects, you should see a repository named
cs134-labs/23xyz3/lab03 (where 23xyz3 is your
CS username). This repository contains your starter files for this
week’s lab.
Clone the repository: find the blue button that is a drop-down
menu that says Clone. Click on it and click on the
“clipboard” icon (Copy URL) option next to the option to
Clone with HTTPS.
Return to the Terminal and type git clone followed by
the URL of the lab project you just copied (you can paste on Windows by
pressing Ctrl-V and on a Mac by pressing
Command-V). This should look like the following:
git clone https://evolene.cs.williams.edu/cs134-labs/23xyz3/lab03.git(where 23xyz3 is again a place-holder for your CS
username.)
Navigate to your newly created lab03 folder in the Terminal:
cd lab03Explore the contents of the directory using the ls
command in the Terminal.
Open VS Code. Then go to File menu option and choose
Open Folder. Navigate to your lab03 directory
and click Open. You should see the starter files of today’s
lab, including madlibs.py, on the left pane of VS Code. All
of the code that you write this week (other than tests) will be in
madlibs.py.
This lab is designed to be completed using only material we have
discussed so far in class. You will need to use for loops
to iterate through sequences, + to update “accumulator
variables”, == to test the equality of two expressions, and
if/elif/else to execute code when
certain conditions hold.
Although not strictly necessary, you may also benefit from other
Python language features, including sequence “slicing”, using the
len() function to calculate a sequence’s length, using
range() to generate a sequence of integers, or accessing
specific sequence elements using square brackets ([]).
You SHOULD NOT use any language features or concepts that we have not yet discussed in lectures, even if they may seem like reasonable tools for parts of this assignment. We have intentionally chosen to focus on algorithmic thinking instead of Python-specific ways of doing things, and we will explore these so-called “Pythonisms” later. In particular, do NOT use any “string methods” or “list methods” (and if you don’t know what those are yet, that is the point of this note!).
A key skill for an algorithmic thinker is the ability to break a complex task into a set of smaller problems, that when solved individually, help solve the larger problem. We’ve referred to this as problem decomposition, and we’ve done a lot of that decomposition for you. The next steps of this lab will introduce those building blocks to you in an order that allows you to develop and test your code incrementally. Then, we will discuss the format that we will be using to represent our Madlibs puzzles, and we will strategize ways to use our building blocks to print the text of completed Madlibs puzzles.
A sequence pre is said to be a prefix of another
sequence seq if each successive element in pre
appears in the same order at the beginning of seq. For
example, here are all prefixes of word:
"", "w", "wo",
"wor", "word"Since a Python str is an ordered sequence of individual characters, we can determine that one str is a prefix of another by comparing the characters in those strs, in order, and confirming that they match.
You should implement this functionality in the Python function
is_prefix(prefix, string), which takes two strs as input:
prefix and string. Your function should return
True if prefix is a prefix of
string, and False otherwise. Below is the
result of calling is_prefix() on a few inputs inside an
interactive Python session. Notice that the values True and
False are being displayed; this is because the function
returns those values. Your function should not print
“True” or “False” (in fact, it should not print anything!). Instead, it
should return a bool value.
>>> is_prefix("pre", "prefix")
True
>>> is_prefix("abc", "prefix")
False
>>> is_prefix("", "prefix")
True
>>> is_prefix("prefix", "prefix")
True
>>> is_prefix("prefixing", "prefix")
FalseAt first glance, it may seem like we need to iterate through the two sequences in unison. There is no way to do this in a single for-each loop without additional Python tricks, so for now, we’ll need to think of another strategy. Luckily, we are equipped with the tools to tackle this problem in several different ways!
One strategy is to, instead of iterating through either
string or prefix, iterate through something
else. The trick is to choose a “something else” that lets us refer to
the appropriate elements from our two actual sequences so we can compare
them. An integer lets us index into a sequence using the “square
bracket” notation (e.g., string[0]), so constructing the
right integer sequence to serve as our indices would allow us to compare
the corresponding elements of string and
prefix to each other. Python’s range() may
help!
Another observation that could help is that, when given two strs
str_a and str_b, the boolean expression
str_a == str_b evaluates to True if the
characters in the strs str_a and str_b are an
exact match. We may be able to construct appropriate subsequence of
string and prefix that we can use for
comparison.
Testing Your Program. To test your
is_prefix() implementation, you can use the
runtests.py program. Since there are multiple functions we
will be testing in this program, we’ve added the ability to specify
which function’s tests to run. To run the is_prefix()
tests, you should give the additional argument “pre”:
% python3 runtests.py preAdding Your Own Tests. When writing our code, it’s
always a good idea to consider different edge cases and to test that our
code handles them appropriately. Under the
# YOUR EXTRA TESTS section in the runtests.py
program, we’ve placed some functions for you to complete with your own
tests: my_is_prefix_test(). Make sure to call your test
function where the rest of the testing functions are called!
A related-but-slightly-trickier problem to the “prefix identification” problem above is the task of identifying suffixes.
A sequence suf is said to be a suffix of another
sequence seq if each successive element in suf
appears in the same order at the ending of seq. For
example, here are all suffixes of word:
"", "d", "rd",
"ord", "word"You should implement the functionality to identify whether a str is a
suffix of another inside is_suffix(suffix, string). Like
is_prefix(), this function will also take two strs as
input: suffix and string. It should return
True if suffix is a suffix of
string, and False otherwise. Below are some
sample invocations from within an interactive Python session:
>>> is_suffix("fix", "suffix")
True
>>> is_suffix("abc", "suffix")
False
>>> is_suffix("", "suffix")
True
>>> is_suffix("suffix", "suffix")
True
>>> is_suffix("fixing", "suffix")
FalseWhy did we suggest that this task might be “trickier” than
identifying a prefix? When we iterate through a sequence using the
for-each pattern, our iteration always starts at the sequence’s first
element. For the prefix version of this problem, the first elements of
string and prefix both correspond to the
characters that we want to compare. So do the second elements, the third
elements, and so on. However, when identifying most suffixes, we’ll want
to start somewhere in the middle of string—not at its
beginning.
One way to approach this problem is to structure our for-each loop so
that it iterates over something other than the elements of
string. Suppose we instead iterate over a sequence of
integers. As long as we can convert those integers into the elements of
our strs that we care about, we can compare them appropriately. The
question, then, is what sequence of integers, and how do we convert
those integers into the desired indices within string and
suffix?
Another option is modify string so that when we
do iterate through its elements, we are only iterating through
elements that we care about. “Slicing” string may let us
“start in the middle” and treat this problem similarly to prefix
identification. In fact, we may even be able to reuse the work we did in
that function!
Like many problems in computer science, there are multiple correct ways to attack it (including strategies not described here). You should choose the approach that matches your thought process—it will be easier to code and debug that way!
Testing Your Program. To test your
is_suffix() implementation, you can use the
runtests.py program with the “suf” argument:
% python3 runtests.py sufAdding Your Own Tests. When writing our code, it’s
always a good idea to consider different edge cases and to test that our
code handles them appropriately. Under the
# YOUR EXTRA TESTS section in the runtests.py
program, we’ve placed some functions for you to complete with your own
tests: my_is_suffix_test(). Make sure to invoke your test
function where the rest of the testing functions are invoked!
Your third lab task is to write a function called
all_text_after(char, string) that takes two str arguments:
char, which should be a str of length 1 (i.e.,
char should always be a single character), and
string, which can be any str value. For example,
char could be the str "r", and
string could be the str "Hello world!". Given
these arguments, your function should return a new string
defined as follows:
char is not a single character (i.e., the str’s
length is not 1), then your function should return an empty str
("")char appears inside the sequence of characters in
string, then your function should return all of the
characters in string that appear after the first
occurrence of char.char does not appear inside the sequence of
characters in string, then your function should return an
empty str ("")Below is the result of calling
all_text_after(char, string) on a few different inputs
inside an interactive Python session. Notice that str values are being
displayed; this is because the function returns those
values. Your function should not print the value it computes (in fact,
it should not print anything!). Instead, it should
return a str value.
>>> all_text_after("r", "Hello World!")
'ld'
>>> all_text_after(" ", "Hello World!")
'World!'
>>> all_text_after("", "Hello World!")
''
>>> all_text_after("World", "Hello World!")
''
>>> all_text_after("!", "Hello World!")
''
>>> all_text_after("H", "Hello World!")
'ello World!'The idea for this function is very similar to
all_text_before(), but instead of stopping our
accumulation of characters when we find the first occurrence of
char, we want to start accumulating characters
after the first occurrence of char. We may pass through
several iterations of our for loop before identifying a
match, so it may be helpful to create a bool variable that changes its
value once the first match is found. You can use an if
inside your for loop to selectively execute the
“accumulation” of characters after your condition is met.
Testing Your Program. To test your
all_text_after() implementation, you can use the
runtests.py program with the “after” argument:
% python3 runtests.py afterAdding Your Own Tests. When writing our code, it’s
always a good idea to consider different edge cases and to test that our
code handles them appropriately. Under the
# YOUR EXTRA TESTS section in the runtests.py
program, we’ve placed some functions for you to complete with your own
tests: my_all_text_after_test(). Make sure to call your
test function in the appropriate place!
At this point we’ve developed several “utility functions” that will serve as useful building blocks when writing our Madlibs puzzle game. But what exactly is a “Madlibs puzzle”, and how is it represented as data in our program?
We’ve broken each Madlibs Puzzle into two files:
.story).answerkey)Puzzle story files contain text interspersed with
one or more “placeholders”. Placeholders are all encoded as single words
that start with a less-than symbol (<), end with a
greater-than symbol (>), and have one or more
alphanumeric characters between those symbols (non-punctuation,
non-whitespace characters). Typically, the text between the symbols is a
descriptor and a number. For example, <adjective1>
and <noun4> are two placeholders in the provided
stories. These placeholders will eventually be substituted with words
that match the placeholder’s description, completing the story.
As an example, an excerpt from a story file named
nursery_rhyme.story might contain the following text:
Mary had a/an <adjective1> lamb. Its <noun1> was <adjective2> as <noun2>.
If we replaced “<adjective1>” with “little”, “<noun1>” with “fleece”, “<adjective2>” with “white”, and “<noun2>” with “snow”, we’d have a (possibly) familiar excerpt from a nursery rhyme:
| Mary had a/an little lamb. Its fleece was white as snow.
On the other hand, if we replaced “<adjective1>” with “enormous”, “<noun1>” with “mousepad”, “<adjective2>” with “loud”, and “<noun2>” with “rhombus”, we’d have something that is grammatically correct, but rather silly.
| Mary had a/an enormous lamb. Its mousepad was loud as rhombus.
Puzzle answer key files are lists of
“swap-key=swap-value” pairs that describe the way that the placeholders
from a story file will be updated. Each “swap-key=swap-value” pair
consists of a placeholder (as described above), an equals sign
(=), and text that will be substituted into the puzzle as a
replacement for all occurrences of the placeholder. For example, a
puzzle answer key file for the substitution above would include the
lines:
<adjective1>=enormous
<noun1>=mousepad
<adjective2>=loud
<noun2>=rhombus
We called these “swap-key=swap-value” pairs because when we see an occurrence of a “swap-key” in our puzzle story file, we will look into our puzzle answer key file to find the matching “swap-value” that should replace it.
The game. Hopefully the above example illustrates
that a Madlib puzzle is, in some sense, a game. Outside of your
program, you can play this game by filling in the contents of a puzzle
answer key file. To have the most fun, you should avoid reading the
puzzle files (files that end in .story). Instead, open up a
puzzle answer key file (a file that ends in .answerkey)
inside VSCode, and fill in the puzzle answer key with words that you
would like to use as substitutions for the placeholders. The placeholder
should describe the qualities of a word that will fit into the
story—grammatically and thematically—but without knowing the context of
the puzzle, the words that you choose will often lead to humorous
substitutions.
The rest of the lab will walk us through the steps to finish a program that takes a puzzle story file and a completed puzzle answer key file, and outputs the (hopefully humorous) version of the story that includes the substitutions.
Now is the point in the lab where we start to put all the pieces together. We’ll first present the algorithm, and then complete one last helper function before implementing the algorithm itself—problem decomposition is a continuous process!
So, at a high level, what is it that we need to do? We need to read in a puzzle’s story and answer key, then go through the story and substitute each placeholder with its matching swap-value from the puzzle’s answer key. Let’s break this down into its own function that we can use as our last building block.
Every time we identify a placeholder within our puzzle
story, we need to find the corresponding swap-value from within
the puzzle answer key. For example, if we see
"<adjective1>" in a story, we need to look through
the puzzle answer key to find the str that starts with
"<adjective1>" (a swap-key), and then
extract the matching swap-value. We will implement this
behavior inside the function
get_madlibs_replacement(placeholder, puzzle_key_list).
This function has two parameters: placeholder, which is
a swap-key such as "<adjective1>", and
puzzle_key_list, which is a list of “swap-key=swap-value”
pairs that we read from a puzzle answer key file.
Luckily, we’ve already written the hardest parts of this function.
Since placeholder is the swap-key in one of the
“swap-key=swap-value” pairs found in puzzle_key_list, we
just need to iterate through the elements in
puzzle_key_list to find it. Once we’ve found a match, the
swap-value is all of the characters that appear after the
"=". (If we don’t find a match, just return
placeholder since no subsitution is possible.)
We can use our is_prefix() function to identify the
matching swap-key, and the all_text_after()
function to extract the swap-value. Finally, we return that
suffix.
Testing Your Program. To test your
get_madlibs_replacement() implementation, you can use the
runtests.py program with the “replace” argument:
% python3 runtests.py replaceThe last step is to write
solved_madlibs(puzzle_story_list, puzzle_key_list), which
implements the Madlibs algorithm and returns a str containing the
completed story. As input, the function takes two parameters:
puzzle_story_list is a list of strs that
stores the contents of a Madlibs puzzle story. This
parameter will typically be the return value from calling
read_stringlist_from_file() on a story file (ends in
.story)puzzle_key_list is a list of strs that
stores the contents of a Madlibs puzzle key. This
parameter will typically be the return value from calling
read_stringlist_from_file() on a key file (ends in
.answerkey). Each element of this list is a
“swap-key=swap-value” pair.The function should return a str.
Given these two lists, you should do the following:
madlibs_puzzle (the
contents of the story), and every time you encounter a str that is not a
placeholder, add it to your accumulator variable (that is a
list of strs) that stores every str in the solved story.
Every time you encounter a str that is a placeholder, use your
get_madlibs_replacement() function to look up the matching
swap-value from puzzle_key_list, and add that
swap-value to your accumulator variable.list that contains all the
strs that are part of your completed puzzle. Convert the contents of
this list into a str using the provided
format_madlib(solved_puzzle) function that we’ve
implemented for you in text_utils.py. The argument to this
function should be your accumulator variable. Your function should
return the resulting str.Formatting Our Madlibs. We have provided a handy
function, format_madlib(solved_puzzle), which we imported
from text_utils at the top of the starter code.
format_madlib(solved_puzzle) takes a list of
strs (e.g., the text of a Madlibs puzzle—solved or not) and prints it to
the screen with nice formatting, as shown below:
>>> story_list = ['Uh', '-', 'oh', ',', 'I', 'forgot', 'to', '<verb1>', 'for', 'the', '<schoolsubject1>', 'exam', '!']
>>> format_madlib(story_list)
'Uh-oh, I forgot to <verb1> for my <schoolsubject1> exam!'Testing Your Program. To test your
solved_madlibs() implementation, you can use the
runtests.py program with the “solve” argument:
% python3 runtests.py solveAt this point in the lab, you’ve written all of the functions you need to complete a Madlibs puzzle game. However, until we invoke those functions, we aren’t actually done. At the bottom of your program, there is a section that resembles the lines:
if __name__ == "__main__":
# comments ...
# more comments ...
passThe section of your program is a standard part of many
.py files. It is a conditional (an if
statement followed by a boolean expression, followed by an indented
section of code), which means that the indented region of code in its
body will only be executed if the condition evaluates to
True. But what does the expression
__name__ == "__main__" mean, and in what circumstances is
it True or False?
We will explore this syntax in more detail later this semester, but the important things to know are this:
program.py as a
script (i.e., from the Terminal you type
python3 program.py), this statement evaluates to
True and the code is executed.import program), this condition evaluates
to False and the code is not executed.This is a very helpful feature. It gives us a place where we can put code in our programs that only executes when we “run” the program. This is often where we’ll write tests or where we’ll include code that reads arguments from the terminal and executes our functions using those arguments.
Now, make it so that running your program actually plays your game! You may wish to modify your program to resemble the following:
if __name__ == "__main__":
# Ask the user which Madlibs puzzle they want us to solve
story_filename = input("What puzzle story file would you like to use? ")
key_filename = input("What puzzle answer key file would you like to use? ")
# Read the contents of the puzzle files into variables we can use
story_list = read_stringlist_from_file(story_filename)
key_list = read_stringlist_from_file(key_filename)
print("Here is your completed Malib:")
print("")
# Solve and print the puzzle's solution
print(solved_madlibs(story_list, key_list))Reading in Files. To implement our Madlibs game, we need a way to store the contents of puzzle story files and puzzle answer key files in variables that our program can manipulate. Since our primary focus is on building our skills with strs and lists, we have implemented this file-reading functionality for you, and we import the necessary modules at the top of the starter code.
One of these two lines at the top of the starter code allows you to
invoke the function read_stringlist_from_file(filename)
from inside your code. The function
read_stringlist_from_file(filename) takes the location of a
file as input (the filename parameter’s type is
str), and it returns a list of all of the words and
punctuation in that file. So for example, if I had a file named
"sample.story" with the contents:
Uh-oh, I forgot to <verb1> for the <schoolsubject1> exam!
and a file named "sample.answerkey" with the
contents:
<verb1>=study
<schoolsubject1>=CS134
Then the following interactive Python session output displays its behavior:
>>> read_stringlist_from_file("sample.story")
['Uh', '-', 'oh', ',', 'I', 'forgot', 'to', '<verb1>', 'for', 'the', '<schoolsubject1>', 'exam', '!']
>>> read_stringlist_from_file("sample.answerkey")
['<verb1>=study', '<schoolsubject1>=CS134']Running Your Program. Finally, to run your completed program, you can type the following command in the Terminal:
% python3 madlibs.pyWhen you are finished adding your functions to the script
madlibs.py, make sure you add and
commit your work. In your Terminal, type:
git add madlibs.py runtests.py
git commit -m "Lab 3 completed"Then you can the push your work (remembering to start
the VPN if you’re working off-campus):
git pushYou can, if you wish, check that your work is up-to-date on https://evolene.cs.williams.edu.
Another way to check that you have committed and pushed all your changes
is through the Terminal. In the Terminal in your lab03
directory, type git status:
git statusIt should show your changes (probably in green) that have not been committed, and files (probably in red, if any), that have not been added. If you have successfully committed and pushed all your work, it should say so.
Please edit the README.md file and enter the names
of any appropriate students on the Collaboration line. Add,
commit, and push this change.
Near the bottom of the README.md, there is a
breakdown of the grading expectations that will form the basis of your
lab’s evaluation. Please keep these in mind as you work through your
lab!
Download a .zip archive of your work. Download
your assignment files for submission by going to your lab repository on
Gitlab, selecting the Download source code icon (a down
arrow), and select zip. Doing so should download all of
your lab files into a single zip archive as lab03-main.zip,
and place it inside your Downloads folder (or to whichever folder is set
as your browser’s default download location).
Submit your work. Navigate to the CS134 course on Gradescope. On your Dashboard, select the appropriate Lab Assignment. Drag and Drop your downloaded zip archive of the Lab Assignment from the previous step, and select ‘Upload’.