Sequences and Loops

Strings as a Sequence

Sequences are an abstract data type in Python that represent ordered collection of elements: e.g., strings, lists, range objects, etc.

Today we will focus on strings which are an ordered sequence of individual characters (type str)

  • Consider for example: word = "Hello"

    • 'H' is the first character of word, 'e' is the second character, and so on.

    • In Computer Science, it is convention to use zero-indexing, so we say fact 'H' is the zeroth character of word, 'e' is the first character, and so on.

We can access each character of a string using indices in Python.

Accessing elements of a sequence using [] operator

word = 'Williams'
word[0]  # character at 0th index?
'W'
word[3]  # character at 3rd index?
'l'
word[7] # character at 7th index?
's'
word[8] # will this work?
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_66065/975401442.py in <module>
----> 1 word[8] # will this work?

IndexError: string index out of range

Length of a Sequence

len() function. Python has a built-in len() function that computes the length of a sequence such as a string (or a list, which we will see in next lecture).

  • For example, len(‘Williams’) outputs 8

Thus, a string word has (positive) indices 0, 1, 2, ..., len(word)-1.

len("Williams")
8
len("pneumonoultramicroscopicsilicovolcanoconiosis") # longest word in English
45

Negative Indexing

Python also allows for negative indices, staring at -1 which is a handy way to refer to the last element of a non-empty sequence (regardless of its length).

Thus, a string word has (negative) indices -1, -2, ..., -len(word).

place = "Williamstown"
place[-1]
'n'
len(place)
12
place[-12]
'W'

Slicing Sequences

Python allows us to extract subsequences of a sequence using the slicing operator [:].

For example, suppose we want to extract the substring Williams from Williamstown. We can use the starting and ending indices of the substring and the slicing operator [:].

place = "Williamstown"
# return the sequence from 0th index up to (not including) 8th
place[0:8] 
'Williams'
place[5:7] # what will this return?
'am'
place[4:4] # what will this return?
''
place[1:] # if end index not provided, defaults to len()
'illiamstown'
place[:8] # if start index not provided, defaults to 0
'Williams'
place[:] # what will this do?
'Williamstown'
place[8:100]  # notice no IndexError when slicing!
'town'
place[-4:-1]  # can also use negative indices to slice 
'tow'

Slicing Sequences with Optional Step

The slicing operator [:] optionally takes a third step parameter that determines in what direction to traverse, and whether to skip any elements while traversing and creating the subsequence.

By default the step is set to +1 (which means move left to right in increments of one).

We can pass other step parameters to obtain new sliced sequences; see examples below.

place = "Williamstown"
place[:8:1] # start is 0, end is 8, step is +1
'Williams'
place[:8:2] # start is 0, end is 8, step is +2 
'Wlim'
place[::2] # start is 0, end is 12, step is +2
'Wlimtw'

Nifty Way to Reverse Sequences

Using a negative value for the step parameter provides a nifty way to reverse sequences.

For example, to reverse a string, we can set the optional step parameter to -1.

place[::-1] # reverse the sequence
'nwotsmailliW'
place[::-2] # step of 2 in reverse order
'nosali'
place[8:0:-1] # notice how start and end are used with a negative step
'tsmailli'

Testing membership: in operator

The in operator in Python returns True or False and is used to test if a given sequence is a subsequence of another sequence.

For example, we can use it to test if a string is a substring of another string (a substring is a contiguous sequence of characters within a string, e.g. Williams is a substring of Williamstown)

'Williams' in 'Williamstown'
True
'W' in 'Williams'
True
'w' in 'Williams' # capitalization matters
False
'liam' in 'WiLLiams' # will this work?
False

lower() and upper() string methods

In addition to functions, Python provides several built-in methods for manipulating strings. Method are like functions but must be called using dot notation on specific strings. We can ignore or manipulate case of strings, using the .lower() and .upper() string methods, which return a new string with the appropriate case.

message = "HELLLOOOO...!!!"
message.lower() # leaves non-alphabetic characters the same
'hellloooo...!!!'
song = "$$ la la la laaa la $$..."
song.upper()
'$$ LA LA LA LAAA LA $$...'

isVowel function

Consider the two isVowel functions below that take a character as input and returns whether or not it is a vowel. The second one is simpler than the first and takes advantage of both the .lower() string method and in string operator.

def oldIsVowel(c):
    """isVowel function"""
    return (c == 'a' or c == 'e' or c == 'i' or c == 'o' or c == 'u' 
            or c == 'A' or c == 'E' or c == 'I' or c == 'O' or c == 'U')
def isVowel(char):
    """Simpler isVowel function"""
    c = char.lower() # convert to lower case first
    return c in 'aeiou' 
oldIsVowel('A')
True
isVowel('z')
False
isVowel('u')
True

Towards Iteration: Counting Vowels

Problem. Using our isVowel() function, let’s write a function countVowels() that takes a string word as input and returns the number of vowels in the string (as an int).

def countVowels(word):
     '''Returns number of vowels in the word'''
     pass
     

Expected behavior:

>>> countVowels('Williamstown')
4
>>> countVowels('Ephelia')
4

Re-using functions. We will use isVowel() to test individual characters of the string, rather than starting from scratch.

What do we need to do to solve this problem?

  • Test each character of the string to see if it is a vowel

  • If we encounter a vowel, we need to remember it (keep a counter for all vowels seen so far)

Attempt using Conditionals

Suppose we manually check each character of the string and update a counter if it is a vowel.

word = 'Williams'     
counter = 0
if isVowel(word[0]):
    counter += 1
if isVowel(word[1]):
    counter += 1
if isVowel(word[2]):
    counter += 1
if isVowel(word[3]):
    counter += 1
if isVowel(word[4]):
    counter += 1
if isVowel(word[5]):
    counter += 1
if isVowel(word[6]):
    counter += 1
if isVowel(word[7]):
    counter += 1
print(counter)        
3

Question. How good is this approach? Will it work for any word?

word = 'Banana'     
counter = 0
if isVowel(word[0]):
    counter += 1
if isVowel(word[1]):
    counter += 1
if isVowel(word[2]):
    counter += 1
if isVowel(word[3]):
    counter += 1
if isVowel(word[4]):
    counter += 1
if isVowel(word[5]):
    counter += 1
if isVowel(word[6]):
    counter += 1
if isVowel(word[7]):
    counter += 1
print(counter)   
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/md/kwd9nc_d2ns0hw9wsvdrnt2c0000gn/T/ipykernel_66065/3871655057.py in <module>
     13 if isVowel(word[5]):
     14     counter += 1
---> 15 if isVowel(word[6]):
     16     counter += 1
     17 if isVowel(word[7]):

IndexError: string index out of range

Takeaway. Downsides of this approach are many:

  • Manually checking every character is not generalizable to arbitrary strings

  • The checks are very repetitive (same for every character in the string): can we automate these repetitive checks?

Iteration over Sequences: for loops

We can “iterate” over the elements of a sequence using a for loop. A loop is a mechanism to repeat the same operations for an entire sequence.

Syntax of for loop

for var in seq:
     do something

var above is called the loop variable of the for loop. It takes on the value of each of the elements of the sequence one by one.

# simple example of for loop
word = "Williams"

for char in word:
    print(char)
W
i
l
l
i
a
m
s

We often want to count or “accumulate” values as we iterate over a sequence. Consider this example that counts the number of characters in a string. Here count is called an accumulation variable. Accumulation variables can be integers or strings.

# count length of string manually

count = 0 # initialize
for char in word:
    count += 1
print(count)
8

Putting it all Together: countVowels

Now, we are ready to implement our function that takes a string as input and returns the number of vowels in it.

def countVowels(word):
    '''Takes a string as input and returns 
    the number of vowels in it'''
    
    count = 0 # initialize the counter
    
    # iterate over the word one character at a time
    for char in word: 
        if isVowel(char): # call helper function
            count += 1
    return count
countVowels('Williams')
3
countVowels('Ephelia')
4

Notice that the for loop does not need to know the length of the sequence ahead of time. In Python, the for loop automatically finishes after the sequence runs out of elements, e.g., word runs out of characters, even though we have not computed the length manually.

Tracing the loop. To observe how the variables char and count change state as the loop proceeds, we can add print statements.

def traceCountVowels(word):
    '''Traces the execution of countAVowels function'''
    count = 0 # initialize the counter
    for char in word: # iterate over the word one character at a time
        print('char, count: ('+ char + ' , ' + str(count) +')')
        if isVowel(char):
            print('Incrementing counter')
            count += 1
    return count
traceCountVowels('Williams')
char, count: (W , 0)
char, count: (i , 0)
Incrementing counter
char, count: (l , 1)
char, count: (l , 1)
char, count: (i , 1)
Incrementing counter
char, count: (a , 2)
Incrementing counter
char, count: (m , 3)
char, count: (s , 3)
3
traceCountVowels('Queue')
char, count: (Q , 0)
char, count: (u , 0)
Incrementing counter
char, count: (e , 1)
Incrementing counter
char, count: (u , 2)
Incrementing counter
char, count: (e , 3)
Incrementing counter
4

Summary. As you can see, the loop variable char takes the value of every character in the string one by one until the last character. Inside the loop, we check if char is a vowel and if so we increment the counter.

Exercise: countChar

Define a function countChar that takes two str arguments: a character and a word, and returns the number of times that character appears in that word (an int).

def countChar(char, word):
    '''Counts the number of times a character appears in a word, ignoring case'''
    # try this on your own!
    pass  # command to use when no function body 
countChar('a', 'Alabama')
countChar('E', 'Ephs')
countChar('o', 'Rhythm')

Exercise: vowelSeq

Define a function vowelSeq that takes a string word as input and returns a string containing all the vowels in word in the same order as they appear.

Example function calls:

>>> vowelSeq("Chicago")
'iao'
>>> vowelSeq("protein")
'oei'
>>> vowelSeq("rhythm")
''
def vowelSeq(word):
    '''Returns the vowel subsequence in given word'''
    vowels = ""  # accumulation variable
    for char in word:
        if isVowel(char): # if vowel
            vowels += char # accumulate characters 
    return vowels
vowelSeq("Chicago")
'iao'
vowelSeq("protein")
'oei'
vowelSeq("rhythm")
''