In this lab, we will build our skills with Python sequences by
searching, parsing, and transforming text. In particular, this lab will
further our experience with the Python str data type (str
),
the Python list data type (list
), and with using the
for-each loop pattern to iterate through data stored in sequences.
We will start with a few seemingly unrelated functions, and we will combine them to complete a concrete task: at the end of this lab, we will have written a small program to solve Madlib-style puzzles, which you can create and solve on your own.
Your pre-lab task is to write a function that takes two str
arguments: char
, which should be a str of length 1 (i.e.,
char
should be a single character), and
string
, which can be any str value. For example,
char
could be the str "r"
, and
string
could be the str "Hello world!"
. Given
these arguments, your function should return a new str defined as
follows:
char
appears inside the sequence of characters in
string
, then your function should return all of the
characters in string
that appear before the first
occurrence of char
.char
is not a single character or if
char
does not appear inside the sequence of characters in
string
, then your program should return all of the
characters in string
Below is the result of calling
all_text_before(char, string)
on a few different inputs
inside an interactive Python session. Notice that str values are being
displayed; this is because the function returns those
values. Your function should not print the value it computes (in fact,
it should not print anything!). Instead, it should
return a str value.
>>> all_text_before("r", "Hello World!")
'Hello Wo'
>>> all_text_before(" ", "Hello World!")
'Hello'
>>> all_text_before("", "Hello World!")
'Hello World!'
>>> all_text_before("World", "Hello World!")
'Hello World!'
>>> all_text_before("H", "Hello World!")
''
Later in this lab, you will write a related function,
all_text_after(char, string)
as a building block for
solving a Madlibs-style puzzle (details below!), so taking the time to
think about this pre-lab function will pay dividends later.
Open the Terminal and go to your cs134
directory
using the cd
(change directory) command you used during Lab
1. (If you are working at a new or different computer, you may want to
create a new cs134
directory if one does not already exist
with the mkdir
command.)
Now we must retrieve the files we need to complete this week’s
lab. Navigate to https://evolene.cs.williams.edu
in your browser and log in using your CS credentials. Under
Projects, you should see a repository named
cs134-labs/23xyz3/lab03
(where 23xyz3
is your
CS username). This repository contains your starter files for this
week’s lab.
Clone the repository: find the blue button that is a drop-down
menu that says Clone
. Click on it and click on the
“clipboard” icon (Copy URL) option next to the option to
Clone with HTTPS
.
Return to the Terminal and type git clone
followed by
the URL of the lab project you just copied (you can paste on Windows by
pressing Ctrl-V
and on a Mac by pressing
Command-V
). This should look like the following:
git clone https://evolene.cs.williams.edu/cs134-labs/23xyz3/lab03.git
(where 23xyz3
is again a place-holder for your CS
username.)
Navigate to your newly created lab03 folder in the Terminal:
cd lab03
Explore the contents of the directory using the ls
command in the Terminal.
Open VS Code. Then go to File
menu option and choose
Open Folder
. Navigate to your lab03
directory
and click Open
. You should see the starter files of today’s
lab, including madlibs.py
, on the left pane of VS Code. All
of the code that you write this week (other than tests) will be in
madlibs.py
.
This lab is designed to be completed using only material we have
discussed so far in class. You will need to use for
loops
to iterate through sequences, +
to update “accumulator
variables”, ==
to test the equality of two expressions, and
if
/elif
/else
to execute code when
certain conditions hold.
Although not strictly necessary, you may also benefit from other
Python language features, including sequence “slicing”, using the
len()
function to calculate a sequence’s length, using
range()
to generate a sequence of integers, or accessing
specific sequence elements using square brackets ([]
).
You SHOULD NOT use any language features or concepts that we have not yet discussed in lectures, even if they may seem like reasonable tools for parts of this assignment. We have intentionally chosen to focus on algorithmic thinking instead of Python-specific ways of doing things, and we will explore these so-called “Pythonisms” later. In particular, do NOT use any “string methods” or “list methods” (and if you don’t know what those are yet, that is the point of this note!).
A key skill for an algorithmic thinker is the ability to break a complex task into a set of smaller problems, that when solved individually, help solve the larger problem. We’ve referred to this as problem decomposition, and we’ve done a lot of that decomposition for you. The next steps of this lab will introduce those building blocks to you in an order that allows you to develop and test your code incrementally. Then, we will discuss the format that we will be using to represent our Madlibs puzzles, and we will strategize ways to use our building blocks to print the text of completed Madlibs puzzles.
A sequence pre
is said to be a prefix of another
sequence seq
if each successive element in pre
appears in the same order at the beginning of seq
. For
example, here are all prefixes of word
:
""
, "w"
, "wo"
,
"wor"
, "word"
Since a Python str is an ordered sequence of individual characters, we can determine that one str is a prefix of another by comparing the characters in those strs, in order, and confirming that they match.
You should implement this functionality in the Python function
is_prefix(prefix, string)
, which takes two strs as input:
prefix
and string
. Your function should return
True
if prefix
is a prefix of
string
, and False
otherwise. Below is the
result of calling is_prefix()
on a few inputs inside an
interactive Python session. Notice that the values True
and
False
are being displayed; this is because the function
returns those values. Your function should not print
“True” or “False” (in fact, it should not print anything!). Instead, it
should return a bool value.
>>> is_prefix("pre", "prefix")
True
>>> is_prefix("abc", "prefix")
False
>>> is_prefix("", "prefix")
True
>>> is_prefix("prefix", "prefix")
True
>>> is_prefix("prefixing", "prefix")
False
At first glance, it may seem like we need to iterate through the two sequences in unison. There is no way to do this in a single for-each loop without additional Python tricks, so for now, we’ll need to think of another strategy. Luckily, we are equipped with the tools to tackle this problem in several different ways!
One strategy is to, instead of iterating through either
string
or prefix
, iterate through something
else. The trick is to choose a “something else” that lets us refer to
the appropriate elements from our two actual sequences so we can compare
them. An integer lets us index into a sequence using the “square
bracket” notation (e.g., string[0]
), so constructing the
right integer sequence to serve as our indices would allow us to compare
the corresponding elements of string
and
prefix
to each other. Python’s range()
may
help!
Another observation that could help is that, when given two strs
str_a
and str_b
, the boolean expression
str_a == str_b
evaluates to True
if the
characters in the strs str_a
and str_b
are an
exact match. We may be able to construct appropriate subsequence of
string
and prefix
that we can use for
comparison.
Testing Your Program. To test your
is_prefix()
implementation, you can use the
runtests.py
program. Since there are multiple functions we
will be testing in this program, we’ve added the ability to specify
which function’s tests to run. To run the is_prefix()
tests, you should give the additional argument “pre”:
% python3 runtests.py pre
Adding Your Own Tests. When writing our code, it’s
always a good idea to consider different edge cases and to test that our
code handles them appropriately. Under the
# YOUR EXTRA TESTS
section in the runtests.py
program, we’ve placed some functions for you to complete with your own
tests: my_is_prefix_test()
. Make sure to call your test
function where the rest of the testing functions are called!
A related-but-slightly-trickier problem to the “prefix identification” problem above is the task of identifying suffixes.
A sequence suf
is said to be a suffix of another
sequence seq
if each successive element in suf
appears in the same order at the ending of seq
. For
example, here are all suffixes of word
:
""
, "d"
, "rd"
,
"ord"
, "word"
You should implement the functionality to identify whether a str is a
suffix of another inside is_suffix(suffix, string)
. Like
is_prefix()
, this function will also take two strs as
input: suffix
and string
. It should return
True
if suffix
is a suffix of
string
, and False
otherwise. Below are some
sample invocations from within an interactive Python session:
>>> is_suffix("fix", "suffix")
True
>>> is_suffix("abc", "suffix")
False
>>> is_suffix("", "suffix")
True
>>> is_suffix("suffix", "suffix")
True
>>> is_suffix("fixing", "suffix")
False
Why did we suggest that this task might be “trickier” than
identifying a prefix? When we iterate through a sequence using the
for-each pattern, our iteration always starts at the sequence’s first
element. For the prefix version of this problem, the first elements of
string
and prefix
both correspond to the
characters that we want to compare. So do the second elements, the third
elements, and so on. However, when identifying most suffixes, we’ll want
to start somewhere in the middle of string
—not at its
beginning.
One way to approach this problem is to structure our for-each loop so
that it iterates over something other than the elements of
string
. Suppose we instead iterate over a sequence of
integers. As long as we can convert those integers into the elements of
our strs that we care about, we can compare them appropriately. The
question, then, is what sequence of integers, and how do we convert
those integers into the desired indices within string
and
suffix
?
Another option is modify string
so that when we
do iterate through its elements, we are only iterating through
elements that we care about. “Slicing” string
may let us
“start in the middle” and treat this problem similarly to prefix
identification. In fact, we may even be able to reuse the work we did in
that function!
Like many problems in computer science, there are multiple correct ways to attack it (including strategies not described here). You should choose the approach that matches your thought process—it will be easier to code and debug that way!
Testing Your Program. To test your
is_suffix()
implementation, you can use the
runtests.py
program with the “suf” argument:
% python3 runtests.py suf
Adding Your Own Tests. When writing our code, it’s
always a good idea to consider different edge cases and to test that our
code handles them appropriately. Under the
# YOUR EXTRA TESTS
section in the runtests.py
program, we’ve placed some functions for you to complete with your own
tests: my_is_suffix_test()
. Make sure to invoke your test
function where the rest of the testing functions are invoked!
Your third lab task is to write a function called
all_text_after(char, string)
that takes two str arguments:
char
, which should be a str of length 1 (i.e.,
char
should always be a single character), and
string
, which can be any str value. For example,
char
could be the str "r"
, and
string
could be the str "Hello world!"
. Given
these arguments, your function should return a new string
defined as follows:
char
is not a single character (i.e., the str’s
length is not 1), then your function should return an empty str
(""
)char
appears inside the sequence of characters in
string
, then your function should return all of the
characters in string
that appear after the first
occurrence of char
.char
does not appear inside the sequence of
characters in string
, then your function should return an
empty str (""
)Below is the result of calling
all_text_after(char, string)
on a few different inputs
inside an interactive Python session. Notice that str values are being
displayed; this is because the function returns those
values. Your function should not print the value it computes (in fact,
it should not print anything!). Instead, it should
return a str value.
>>> all_text_after("r", "Hello World!")
'ld'
>>> all_text_after(" ", "Hello World!")
'World!'
>>> all_text_after("", "Hello World!")
''
>>> all_text_after("World", "Hello World!")
''
>>> all_text_after("!", "Hello World!")
''
>>> all_text_after("H", "Hello World!")
'ello World!'
The idea for this function is very similar to
all_text_before()
, but instead of stopping our
accumulation of characters when we find the first occurrence of
char
, we want to start accumulating characters
after the first occurrence of char
. We may pass through
several iterations of our for
loop before identifying a
match, so it may be helpful to create a bool variable that changes its
value once the first match is found. You can use an if
inside your for
loop to selectively execute the
“accumulation” of characters after your condition is met.
Testing Your Program. To test your
all_text_after()
implementation, you can use the
runtests.py
program with the “after” argument:
% python3 runtests.py after
Adding Your Own Tests. When writing our code, it’s
always a good idea to consider different edge cases and to test that our
code handles them appropriately. Under the
# YOUR EXTRA TESTS
section in the runtests.py
program, we’ve placed some functions for you to complete with your own
tests: my_all_text_after_test()
. Make sure to call your
test function in the appropriate place!
At this point we’ve developed several “utility functions” that will serve as useful building blocks when writing our Madlibs puzzle game. But what exactly is a “Madlibs puzzle”, and how is it represented as data in our program?
We’ve broken each Madlibs Puzzle into two files:
.story
).answerkey
)Puzzle story files contain text interspersed with
one or more “placeholders”. Placeholders are all encoded as single words
that start with a less-than symbol (<
), end with a
greater-than symbol (>
), and have one or more
alphanumeric characters between those symbols (non-punctuation,
non-whitespace characters). Typically, the text between the symbols is a
descriptor and a number. For example, <adjective1>
and <noun4>
are two placeholders in the provided
stories. These placeholders will eventually be substituted with words
that match the placeholder’s description, completing the story.
As an example, an excerpt from a story file named
nursery_rhyme.story
might contain the following text:
Mary had a/an <adjective1> lamb. Its <noun1> was <adjective2> as <noun2>.
If we replaced “<adjective1>” with “little”, “<noun1>” with “fleece”, “<adjective2>” with “white”, and “<noun2>” with “snow”, we’d have a (possibly) familiar excerpt from a nursery rhyme:
| Mary had a/an little lamb. Its fleece was white as snow.
On the other hand, if we replaced “<adjective1>” with “enormous”, “<noun1>” with “mousepad”, “<adjective2>” with “loud”, and “<noun2>” with “rhombus”, we’d have something that is grammatically correct, but rather silly.
| Mary had a/an enormous lamb. Its mousepad was loud as rhombus.
Puzzle answer key files are lists of
“swap-key=swap-value” pairs that describe the way that the placeholders
from a story file will be updated. Each “swap-key=swap-value” pair
consists of a placeholder (as described above), an equals sign
(=
), and text that will be substituted into the puzzle as a
replacement for all occurrences of the placeholder. For example, a
puzzle answer key file for the substitution above would include the
lines:
<adjective1>=enormous
<noun1>=mousepad
<adjective2>=loud
<noun2>=rhombus
We called these “swap-key=swap-value” pairs because when we see an occurrence of a “swap-key” in our puzzle story file, we will look into our puzzle answer key file to find the matching “swap-value” that should replace it.
The game. Hopefully the above example illustrates
that a Madlib puzzle is, in some sense, a game. Outside of your
program, you can play this game by filling in the contents of a puzzle
answer key file. To have the most fun, you should avoid reading the
puzzle files (files that end in .story
). Instead, open up a
puzzle answer key file (a file that ends in .answerkey
)
inside VSCode, and fill in the puzzle answer key with words that you
would like to use as substitutions for the placeholders. The placeholder
should describe the qualities of a word that will fit into the
story—grammatically and thematically—but without knowing the context of
the puzzle, the words that you choose will often lead to humorous
substitutions.
The rest of the lab will walk us through the steps to finish a program that takes a puzzle story file and a completed puzzle answer key file, and outputs the (hopefully humorous) version of the story that includes the substitutions.
Now is the point in the lab where we start to put all the pieces together. We’ll first present the algorithm, and then complete one last helper function before implementing the algorithm itself—problem decomposition is a continuous process!
So, at a high level, what is it that we need to do? We need to read in a puzzle’s story and answer key, then go through the story and substitute each placeholder with its matching swap-value from the puzzle’s answer key. Let’s break this down into its own function that we can use as our last building block.
Every time we identify a placeholder within our puzzle
story, we need to find the corresponding swap-value from within
the puzzle answer key. For example, if we see
"<adjective1>"
in a story, we need to look through
the puzzle answer key to find the str that starts with
"<adjective1>"
(a swap-key), and then
extract the matching swap-value. We will implement this
behavior inside the function
get_madlibs_replacement(placeholder, puzzle_key_list)
.
This function has two parameters: placeholder
, which is
a swap-key such as "<adjective1>"
, and
puzzle_key_list
, which is a list of “swap-key=swap-value”
pairs that we read from a puzzle answer key file.
Luckily, we’ve already written the hardest parts of this function.
Since placeholder
is the swap-key in one of the
“swap-key=swap-value” pairs found in puzzle_key_list
, we
just need to iterate through the elements in
puzzle_key_list
to find it. Once we’ve found a match, the
swap-value is all of the characters that appear after the
"="
. (If we don’t find a match, just return
placeholder
since no subsitution is possible.)
We can use our is_prefix()
function to identify the
matching swap-key, and the all_text_after()
function to extract the swap-value. Finally, we return that
suffix.
Testing Your Program. To test your
get_madlibs_replacement()
implementation, you can use the
runtests.py
program with the “replace” argument:
% python3 runtests.py replace
The last step is to write
solved_madlibs(puzzle_story_list, puzzle_key_list)
, which
implements the Madlibs algorithm and returns a str containing the
completed story. As input, the function takes two parameters:
puzzle_story_list
is a list
of strs that
stores the contents of a Madlibs puzzle story. This
parameter will typically be the return value from calling
read_stringlist_from_file()
on a story file (ends in
.story
)puzzle_key_list
is a list
of strs that
stores the contents of a Madlibs puzzle key. This
parameter will typically be the return value from calling
read_stringlist_from_file()
on a key file (ends in
.answerkey
). Each element of this list
is a
“swap-key=swap-value” pair.The function should return a str.
Given these two lists, you should do the following:
madlibs_puzzle
(the
contents of the story), and every time you encounter a str that is not a
placeholder, add it to your accumulator variable (that is a
list
of strs) that stores every str in the solved story.
Every time you encounter a str that is a placeholder, use your
get_madlibs_replacement()
function to look up the matching
swap-value from puzzle_key_list
, and add that
swap-value to your accumulator variable.list
that contains all the
strs that are part of your completed puzzle. Convert the contents of
this list
into a str using the provided
format_madlib(solved_puzzle)
function that we’ve
implemented for you in text_utils.py
. The argument to this
function should be your accumulator variable. Your function should
return the resulting str.Formatting Our Madlibs. We have provided a handy
function, format_madlib(solved_puzzle)
, which we imported
from text_utils
at the top of the starter code.
format_madlib(solved_puzzle)
takes a list
of
strs (e.g., the text of a Madlibs puzzle—solved or not) and prints it to
the screen with nice formatting, as shown below:
>>> story_list = ['Uh', '-', 'oh', ',', 'I', 'forgot', 'to', '<verb1>', 'for', 'the', '<schoolsubject1>', 'exam', '!']
>>> format_madlib(story_list)
'Uh-oh, I forgot to <verb1> for my <schoolsubject1> exam!'
Testing Your Program. To test your
solved_madlibs()
implementation, you can use the
runtests.py
program with the “solve” argument:
% python3 runtests.py solve
At this point in the lab, you’ve written all of the functions you need to complete a Madlibs puzzle game. However, until we invoke those functions, we aren’t actually done. At the bottom of your program, there is a section that resembles the lines:
if __name__ == "__main__":
# comments ...
# more comments ...
pass
The section of your program is a standard part of many
.py
files. It is a conditional (an if
statement followed by a boolean expression, followed by an indented
section of code), which means that the indented region of code in its
body will only be executed if the condition evaluates to
True
. But what does the expression
__name__ == "__main__"
mean, and in what circumstances is
it True
or False
?
We will explore this syntax in more detail later this semester, but the important things to know are this:
program.py
as a
script (i.e., from the Terminal you type
python3 program.py
), this statement evaluates to
True
and the code is executed.import program
), this condition evaluates
to False
and the code is not executed.This is a very helpful feature. It gives us a place where we can put code in our programs that only executes when we “run” the program. This is often where we’ll write tests or where we’ll include code that reads arguments from the terminal and executes our functions using those arguments.
Now, make it so that running your program actually plays your game! You may wish to modify your program to resemble the following:
if __name__ == "__main__":
# Ask the user which Madlibs puzzle they want us to solve
= input("What puzzle story file would you like to use? ")
story_filename = input("What puzzle answer key file would you like to use? ")
key_filename
# Read the contents of the puzzle files into variables we can use
= read_stringlist_from_file(story_filename)
story_list = read_stringlist_from_file(key_filename)
key_list
print("Here is your completed Malib:")
print("")
# Solve and print the puzzle's solution
print(solved_madlibs(story_list, key_list))
Reading in Files. To implement our Madlibs game, we need a way to store the contents of puzzle story files and puzzle answer key files in variables that our program can manipulate. Since our primary focus is on building our skills with strs and lists, we have implemented this file-reading functionality for you, and we import the necessary modules at the top of the starter code.
One of these two lines at the top of the starter code allows you to
invoke the function read_stringlist_from_file(filename)
from inside your code. The function
read_stringlist_from_file(filename)
takes the location of a
file as input (the filename
parameter’s type is
str
), and it returns a list of all of the words and
punctuation in that file. So for example, if I had a file named
"sample.story"
with the contents:
Uh-oh, I forgot to <verb1> for the <schoolsubject1> exam!
and a file named "sample.answerkey"
with the
contents:
<verb1>=study
<schoolsubject1>=CS134
Then the following interactive Python session output displays its behavior:
>>> read_stringlist_from_file("sample.story")
['Uh', '-', 'oh', ',', 'I', 'forgot', 'to', '<verb1>', 'for', 'the', '<schoolsubject1>', 'exam', '!']
>>> read_stringlist_from_file("sample.answerkey")
['<verb1>=study', '<schoolsubject1>=CS134']
Running Your Program. Finally, to run your completed program, you can type the following command in the Terminal:
% python3 madlibs.py
When you are finished adding your functions to the script
madlibs.py
, make sure you add
and
commit
your work. In your Terminal, type:
git add madlibs.py runtests.py
git commit -m "Lab 3 completed"
Then you can the push
your work (remembering to start
the VPN
if you’re working off-campus):
git push
You can, if you wish, check that your work is up-to-date on https://evolene.cs.williams.edu.
Another way to check that you have committed and pushed all your changes
is through the Terminal. In the Terminal in your lab03
directory, type git status
:
git status
It should show your changes (probably in green) that have not been committed, and files (probably in red, if any), that have not been added. If you have successfully committed and pushed all your work, it should say so.
Please edit the README.md
file and enter the names
of any appropriate students on the Collaboration
line. Add,
commit, and push this change.
Near the bottom of the README.md
, there is a
breakdown of the grading expectations that will form the basis of your
lab’s evaluation. Please keep these in mind as you work through your
lab!
Download a .zip archive of your work. Download
your assignment files for submission by going to your lab repository on
Gitlab, selecting the Download source code
icon (a down
arrow), and select zip
. Doing so should download all of
your lab files into a single zip archive as lab03-main.zip
,
and place it inside your Downloads folder (or to whichever folder is set
as your browser’s default download location).
Submit your work. Navigate to the CS134 course on Gradescope. On your Dashboard, select the appropriate Lab Assignment. Drag and Drop your downloaded zip archive of the Lab Assignment from the previous step, and select ‘Upload’.