Files & Ranges

In the last few lectures, we learned about sequences (strings, lists) and how to iterate over them using loops. Today we will look at the range data type, which is another type of sequence.

We will alsi look at how to read from files, and store the contents as a string or list of strings. After doing so, we will look at some common operations involving lists, strings and counters that are useful when analyzing data.

(Ignore the following cell. It’s required for formatting this page.)

%%html
<style>
  table {margin-left: 0 !important;}
</style>

List operations

numList = [1, 2, 3]
numList2 = numList + ["6"]
numList
[1, 2, 3]
numList2
[1, 2, 3, '6']

Range sequences

Python provides an easy way to iterate over common numerical sequences through the range data type. We create ranges using the range() function.

range(0,10)
range(0, 10)
type(range(0, 10))
range

To examine the contents of a range object, we can pass the object into the function list() which returns a list of the numbers in the range object.

Similar to other types that we have seen, such as integers, floats and strings, the built-in function list() converts values and other data types into a list.

Using list() on range objects:

The list() function, when given a range object, returns a list of the elements in that range. This is convenient for see what a range object actually consists of.

list(range(0, 10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Notice. The range(firstNum, secondNum) represents all numbers from firstNum through secondNum - 1. If the firstNum is 0, we can omit. For example:

list(range(-10, 10))
[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(range(3))
[0, 1, 2]

Looping over ranges

Range functions provide us with an iterable sequence, which we can loop over, just like we did with strings and list.

# simple for loop that prints numbers 1-10
for i in range(1, 11):  
    print(i)
1
2
3
4
5
6
7
8
9
10
# what does this print?

for i in range(5):  
    print('$' * i)
for j in range(5):  
    print('*' * j)
$
$$
$$$
$$$$

*
**
***
****
# what does this print?

for i in range(5):
    print('$' * i)
    for j in range(i):
        print('*' * j)      
$

$$

*
$$$

*
**
$$$$

*
**
***
# convention: use _ for loop variable when we don't need to reference it in loop
for _ in range(10):
    print('Hello World!')
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!

Reading from a File

Reading and writing to/from files are important operations in data analysis. We can easily read and write to files using Python.

To open a file for reading or writing, we use the built-in function open().

# 'r' means open the file for reading
book = open('textfiles/mountains.txt', 'r') 

Mode. The mode can either be 'w' for writing, 'a' for appending, or 'r' for reading. The mode 'r' for reading the file is the default (and optional). So when reading from a file (instead of writing), we can just write:

book = open('textfiles/mountains.txt') 

With…as block and iterating over files

With block to open and close files. Whenever you open a file, you must also close it to prevent future problems like memory leaks. To avoid writing code to explicitly open and close, we will use the with...as block which keeps the file open within it, and automatically closes the file after exiting the block.

Within a with...as block, we can iterate over the lines of a file using a for loop in the same way as we would iterate over any sequence.

# read input file and print each line
with open('textfiles/mountains.txt') as book:
    for line in book:
        print(line.strip())
# file is implicitly closed here
O, proudly rise the monarchs of our mountain land,
With their kingly forest robes, to the sky,
Where Alma Mater dwelleth with her chosen band,
And the peaceful river floweth gently by.

The mountains! The mountains! We greet them with a song,
Whose echoes rebounding their woodland heights along,
Shall mingle with anthems that winds and fountains sing,
Till hill and valley gaily gaily ring.

Beneath their peaceful shadows may old Williams stand,
Till the suns and mountains never more shall be,
The glory and the honor of our mountain land,
And the dwelling of the gallant and the free.

The mountains! The mountains! We greet them with a song,
Whose echoes rebounding their woodland heights along,
Shall mingle with anthems that winds and fountains sing,
Till hill and valley gaily gaily ring.
# read input file and use .split() to create a 
# list of strings *for each line*
with open('textfiles/mountains.txt') as book: 
    for line in book:
        print(line.strip().split()) 
['O,', 'proudly', 'rise', 'the', 'monarchs', 'of', 'our', 'mountain', 'land,']
['With', 'their', 'kingly', 'forest', 'robes,', 'to', 'the', 'sky,']
['Where', 'Alma', 'Mater', 'dwelleth', 'with', 'her', 'chosen', 'band,']
['And', 'the', 'peaceful', 'river', 'floweth', 'gently', 'by.']
[]
['The', 'mountains!', 'The', 'mountains!', 'We', 'greet', 'them', 'with', 'a', 'song,']
['Whose', 'echoes', 'rebounding', 'their', 'woodland', 'heights', 'along,']
['Shall', 'mingle', 'with', 'anthems', 'that', 'winds', 'and', 'fountains', 'sing,']
['Till', 'hill', 'and', 'valley', 'gaily', 'gaily', 'ring.']
[]
['Beneath', 'their', 'peaceful', 'shadows', 'may', 'old', 'Williams', 'stand,']
['Till', 'the', 'suns', 'and', 'mountains', 'never', 'more', 'shall', 'be,']
['The', 'glory', 'and', 'the', 'honor', 'of', 'our', 'mountain', 'land,']
['And', 'the', 'dwelling', 'of', 'the', 'gallant', 'and', 'the', 'free.']
[]
['The', 'mountains!', 'The', 'mountains!', 'We', 'greet', 'them', 'with', 'a', 'song,']
['Whose', 'echoes', 'rebounding', 'their', 'woodland', 'heights', 'along,']
['Shall', 'mingle', 'with', 'anthems', 'that', 'winds', 'and', 'fountains', 'sing,']
['Till', 'hill', 'and', 'valley', 'gaily', 'gaily', 'ring.']
# if we want to create one big list of the words, we can accumulate
# in a list using the extend() method
wordList = [] 
with open('textfiles/mountains.txt') as book:  
    for line in book:
        wordList.extend(line.strip().split())
wordList
['O,',
 'proudly',
 'rise',
 'the',
 'monarchs',
 'of',
 'our',
 'mountain',
 'land,',
 'With',
 'their',
 'kingly',
 'forest',
 'robes,',
 'to',
 'the',
 'sky,',
 'Where',
 'Alma',
 'Mater',
 'dwelleth',
 'with',
 'her',
 'chosen',
 'band,',
 'And',
 'the',
 'peaceful',
 'river',
 'floweth',
 'gently',
 'by.',
 'The',
 'mountains!',
 'The',
 'mountains!',
 'We',
 'greet',
 'them',
 'with',
 'a',
 'song,',
 'Whose',
 'echoes',
 'rebounding',
 'their',
 'woodland',
 'heights',
 'along,',
 'Shall',
 'mingle',
 'with',
 'anthems',
 'that',
 'winds',
 'and',
 'fountains',
 'sing,',
 'Till',
 'hill',
 'and',
 'valley',
 'gaily',
 'gaily',
 'ring.',
 'Beneath',
 'their',
 'peaceful',
 'shadows',
 'may',
 'old',
 'Williams',
 'stand,',
 'Till',
 'the',
 'suns',
 'and',
 'mountains',
 'never',
 'more',
 'shall',
 'be,',
 'The',
 'glory',
 'and',
 'the',
 'honor',
 'of',
 'our',
 'mountain',
 'land,',
 'And',
 'the',
 'dwelling',
 'of',
 'the',
 'gallant',
 'and',
 'the',
 'free.',
 'The',
 'mountains!',
 'The',
 'mountains!',
 'We',
 'greet',
 'them',
 'with',
 'a',
 'song,',
 'Whose',
 'echoes',
 'rebounding',
 'their',
 'woodland',
 'heights',
 'along,',
 'Shall',
 'mingle',
 'with',
 'anthems',
 'that',
 'winds',
 'and',
 'fountains',
 'sing,',
 'Till',
 'hill',
 'and',
 'valley',
 'gaily',
 'gaily',
 'ring.']
len(wordList) # total number of words
133
# number of times a word ('mountains!') is in the song?
wordList.count('mountains!')
4
# number of times a word ('gaily') is in the song?
wordList.count('gaily')
4
# number of times a word ('hill') is in the song?
wordList.count('hill')
2

CSV Files

A CSV (Comma Separated Values) file is a type of plain text file that stores tabular data. Each row of a table is a line in the text file, with each column in the row separated by commas. This format is the most common import and export format for spreadsheets and databases.

For example a simple table such as the following with column names and ages would be represented in a CSV as:

Table:

Name

Age

Charlie Brown

8

Snoopy

72

Patty

7

CSV:

Name,Age
Charlie Brown,8
Snoopy,72
Patty,7

We can handle csv files similar to text files and use string/list methods to process the tabular data.

Reading in Student Names

The name of students in this class are in classnames.csv in directory csv.

filename = 'csv/classnames.csv' 
with open(filename) as roster:  
    for line in roster:
        print(line.strip())
Acosta,RJ
Adelman,Jackson C.
Agha,Harris
Alcock,Nick R.
Aragon,Valeria
Arian,M Aditta
Atli,Emir C.
Berrutti Bartesaghi,Martina
Bhatia,Anjali K.
Bossman,Tryphena
Brant,Nora E.
Cass,Ryan T.
Chang,Daniel Y.
Chang,Kayla
Chen,Will J.
Choi,Alex W.
Clarke,Grace A.
Cooper,Ethan
Cross,Harry
Diaz,Felix L.
Durham,Keelan S.
Edwards-Mizel,Edith N.
Espinosa,Pedro R.
Estejab,Amir H.
Felten,Timothy E.
Fluehr,Arden N.
Fuentes,Leilani
Gutchess,Jane C.
Gwilt,Kyle E.
Hartman,Sarah A.
Horvath,Sasha G.
Howard-Sarin,Brij C.
Iazzag,Lesley C.
Izidro,Patrick
Jain,Sameer
Jiang,Kevin Y.
Jiang,Weiran
Joy,Matt L.
Juneja,Riya
Kerest,Lena O.
Keyes,Mikey A.
Kimm,Miranda C.
Kingchatchaval,Prom
Kroninger,Noah J.
Kubomiya,Reona
Lee,Gabe
Lee,Yuri J.
Levy,Arielle T.
Liebman,Myer C.
Lindsay,Rebekah A.
Lowe,Jade
Maffei,Grace K.
Matin,Julia M.
Miller,Emma C.
Napeloni,Jack D.
Nelson,Paige E.
Nguyen,Trung Nguyen T.
Nolan,Natalia H.
Nordhoff,Jaquelin T.
Overholt,Reece K.
Pal,Kunal
Park,Min Kyu
Park,Tiffany J.
Paul,Betsy
Phang,Matthew L.
Pineda Gutierrez,Doug J.
Rajbhandary,Priya
Ramasamy,Gautam
Randazzo,Genevieve B.
Rice,Kendall L.
Rogers,Kimberly
Rubinshteyn,Leah
Samuel,Sam
Sarmiento,Jennifer R.
Sayed,Alyse
Shah,Maximilian A.
Shankaran,Hari
Shareshian,Matt R.
Singh,Gurinder
Smith,Aniya J.
Sukup,Ella G.
Symkowick,Marta G.
Vaska,Rein T.
Vilandrie,Nevin D.
Vilfort,C.J.
Williams,Harrison P.
Winters,Olivia V.
Wisotsky,Matt
Wynn,Jordan A.
Yager,Ruby J.
Yanashita,Rick
Yarter,Skylar O.
Zhang,Winnie
Zhou,Nicole S.
Zou,Addison

Collecting names in a list

Suppose we want to create a list of all names, where names appear in firstName (M.I.) lastName format. How do we achieve that?

students = [] # initialize empty list
filename = "csv/classNames.csv"
with open(filename) as roster: 
    for line in roster:
        fullName = line.strip().split(',')
        firstName = fullName[1]
        lastName = fullName[0]
        # print(firstName,lastName)
        students.append(firstName + ' ' + lastName)
students
['RJ Acosta',
 'Jackson C. Adelman',
 'Harris Agha',
 'Nick R. Alcock',
 'Valeria Aragon',
 'M Aditta Arian',
 'Emir C. Atli',
 'Martina Berrutti Bartesaghi',
 'Anjali K. Bhatia',
 'Tryphena Bossman',
 'Nora E. Brant',
 'Ryan T. Cass',
 'Daniel Y. Chang',
 'Kayla Chang',
 'Will J. Chen',
 'Alex W. Choi',
 'Grace A. Clarke',
 'Ethan Cooper',
 'Harry Cross',
 'Felix L. Diaz',
 'Keelan S. Durham',
 'Edith N. Edwards-Mizel',
 'Pedro R. Espinosa',
 'Amir H. Estejab',
 'Timothy E. Felten',
 'Arden N. Fluehr',
 'Leilani Fuentes',
 'Jane C. Gutchess',
 'Kyle E. Gwilt',
 'Sarah A. Hartman',
 'Sasha G. Horvath',
 'Brij C. Howard-Sarin',
 'Lesley C. Iazzag',
 'Patrick Izidro',
 'Sameer Jain',
 'Kevin Y. Jiang',
 'Weiran Jiang',
 'Matt L. Joy',
 'Riya Juneja',
 'Lena O. Kerest',
 'Mikey A. Keyes',
 'Miranda C. Kimm',
 'Prom Kingchatchaval',
 'Noah J. Kroninger',
 'Reona Kubomiya',
 'Gabe Lee',
 'Yuri J. Lee',
 'Arielle T. Levy',
 'Myer C. Liebman',
 'Rebekah A. Lindsay',
 'Jade Lowe',
 'Grace K. Maffei',
 'Julia M. Matin',
 'Emma C. Miller',
 'Jack D. Napeloni',
 'Paige E. Nelson',
 'Trung Nguyen T. Nguyen',
 'Natalia H. Nolan',
 'Jaquelin T. Nordhoff',
 'Reece K. Overholt',
 'Kunal Pal',
 'Min Kyu Park',
 'Tiffany J. Park',
 'Betsy Paul',
 'Matthew L. Phang',
 'Doug J. Pineda Gutierrez',
 'Priya Rajbhandary',
 'Gautam Ramasamy',
 'Genevieve B. Randazzo',
 'Kendall L. Rice',
 'Kimberly Rogers',
 'Leah Rubinshteyn',
 'Sam Samuel',
 'Jennifer R. Sarmiento',
 'Alyse Sayed',
 'Maximilian A. Shah',
 'Hari Shankaran',
 'Matt R. Shareshian',
 'Gurinder Singh',
 'Aniya J. Smith',
 'Ella G. Sukup',
 'Marta G. Symkowick',
 'Rein T. Vaska',
 'Nevin D. Vilandrie',
 'C.J. Vilfort',
 'Harrison P. Williams',
 'Olivia V. Winters',
 'Matt Wisotsky',
 'Jordan A. Wynn',
 'Ruby J. Yager',
 'Rick Yanashita',
 'Skylar O. Yarter',
 'Winnie Zhang',
 'Nicole S. Zhou',
 'Addison Zou']

Writing to Files

We can write all the results that we are computing into a file (a persistent structure). To open a file for writing, we use open with the mode ‘w’.

The following code will create a new file named studentNames.txt in the current working directory and write in it results of our function calls.

with open('studentNames.txt', 'w') as sFile:
    sFile.write('CS134 students:\n') # need newlines
    sFile.write('\n'.join(students))

We can use ls -l to see that a new file studentNames.txt has been created:

ls -l
total 668528
drwxr-xr-x  3 jeannie  staff         96 Aug  3 15:58 __pycache__/
drwxr-xr-x  5 jeannie  staff        160 Sep 26 21:09 csv/
-rwx------@ 1 jeannie  staff  206981897 Sep 27 20:19 files-and-comprehensions-jeannie.key*
-rw-r--r--@ 1 jeannie  staff      17779 Sep 29 15:48 files-and-comprehensions.ipynb
-rwxr-xr-x@ 1 jeannie  staff  116516815 Sep 28 07:32 files-and-comprehensions.key*
-rw-r--r--@ 1 jeannie  staff    2289899 Sep 27 20:43 files-and-comprehensions.pdf
-rw-r--r--  1 jeannie  staff       2083 Aug  3 14:27 sequenceTools.py
-rw-r--r--  1 jeannie  staff       1515 Sep 29 15:48 studentNames.txt
drwxr-xr-x  4 jeannie  staff        128 Aug  3 14:27 textfiles/

Use the Unix command cat to view the contents of the file:

cat studentNames.txt
CS134 students:
RJ Acosta
Jackson C. Adelman
Harris Agha
Nick R. Alcock
Valeria Aragon
M Aditta Arian
Emir C. Atli
Martina Berrutti Bartesaghi
Anjali K. Bhatia
Tryphena Bossman
Nora E. Brant
Ryan T. Cass
Daniel Y. Chang
Kayla Chang
Will J. Chen
Alex W. Choi
Grace A. Clarke
Ethan Cooper
Harry Cross
Felix L. Diaz
Keelan S. Durham
Edith N. Edwards-Mizel
Pedro R. Espinosa
Amir H. Estejab
Timothy E. Felten
Arden N. Fluehr
Leilani Fuentes
Jane C. Gutchess
Kyle E. Gwilt
Sarah A. Hartman
Sasha G. Horvath
Brij C. Howard-Sarin
Lesley C. Iazzag
Patrick Izidro
Sameer Jain
Kevin Y. Jiang
Weiran Jiang
Matt L. Joy
Riya Juneja
Lena O. Kerest
Mikey A. Keyes
Miranda C. Kimm
Prom Kingchatchaval
Noah J. Kroninger
Reona Kubomiya
Gabe Lee
Yuri J. Lee
Arielle T. Levy
Myer C. Liebman
Rebekah A. Lindsay
Jade Lowe
Grace K. Maffei
Julia M. Matin
Emma C. Miller
Jack D. Napeloni
Paige E. Nelson
Trung Nguyen T. Nguyen
Natalia H. Nolan
Jaquelin T. Nordhoff
Reece K. Overholt
Kunal Pal
Min Kyu Park
Tiffany J. Park
Betsy Paul
Matthew L. Phang
Doug J. Pineda Gutierrez
Priya Rajbhandary
Gautam Ramasamy
Genevieve B. Randazzo
Kendall L. Rice
Kimberly Rogers
Leah Rubinshteyn
Sam Samuel
Jennifer R. Sarmiento
Alyse Sayed
Maximilian A. Shah
Hari Shankaran
Matt R. Shareshian
Gurinder Singh
Aniya J. Smith
Ella G. Sukup
Marta G. Symkowick
Rein T. Vaska
Nevin D. Vilandrie
C.J. Vilfort
Harrison P. Williams
Olivia V. Winters
Matt Wisotsky
Jordan A. Wynn
Ruby J. Yager
Rick Yanashita
Skylar O. Yarter
Winnie Zhang
Nicole S. Zhou
Addison Zou

Appending to Files

If a file already has something in it, opening it in w mode again will erase all its past contents. If we need to append something to a file, we open it in append a mode.

For example, let us append a sentence to studentNames.txt.

with open('studentNames.txt', 'a') as sFile:
    sFile.write('\nGoodbye.\n')
cat studentNames.txt 
CS134 students:
RJ Acosta
Jackson C. Adelman
Harris Agha
Nick R. Alcock
Valeria Aragon
M Aditta Arian
Emir C. Atli
Martina Berrutti Bartesaghi
Anjali K. Bhatia
Tryphena Bossman
Nora E. Brant
Ryan T. Cass
Daniel Y. Chang
Kayla Chang
Will J. Chen
Alex W. Choi
Grace A. Clarke
Ethan Cooper
Harry Cross
Felix L. Diaz
Keelan S. Durham
Edith N. Edwards-Mizel
Pedro R. Espinosa
Amir H. Estejab
Timothy E. Felten
Arden N. Fluehr
Leilani Fuentes
Jane C. Gutchess
Kyle E. Gwilt
Sarah A. Hartman
Sasha G. Horvath
Brij C. Howard-Sarin
Lesley C. Iazzag
Patrick Izidro
Sameer Jain
Kevin Y. Jiang
Weiran Jiang
Matt L. Joy
Riya Juneja
Lena O. Kerest
Mikey A. Keyes
Miranda C. Kimm
Prom Kingchatchaval
Noah J. Kroninger
Reona Kubomiya
Gabe Lee
Yuri J. Lee
Arielle T. Levy
Myer C. Liebman
Rebekah A. Lindsay
Jade Lowe
Grace K. Maffei
Julia M. Matin
Emma C. Miller
Jack D. Napeloni
Paige E. Nelson
Trung Nguyen T. Nguyen
Natalia H. Nolan
Jaquelin T. Nordhoff
Reece K. Overholt
Kunal Pal
Min Kyu Park
Tiffany J. Park
Betsy Paul
Matthew L. Phang
Doug J. Pineda Gutierrez
Priya Rajbhandary
Gautam Ramasamy
Genevieve B. Randazzo
Kendall L. Rice
Kimberly Rogers
Leah Rubinshteyn
Sam Samuel
Jennifer R. Sarmiento
Alyse Sayed
Maximilian A. Shah
Hari Shankaran
Matt R. Shareshian
Gurinder Singh
Aniya J. Smith
Ella G. Sukup
Marta G. Symkowick
Rein T. Vaska
Nevin D. Vilandrie
C.J. Vilfort
Harrison P. Williams
Olivia V. Winters
Matt Wisotsky
Jordan A. Wynn
Ruby J. Yager
Rick Yanashita
Skylar O. Yarter
Winnie Zhang
Nicole S. Zhou
Addison Zou
Goodbye.

Using Functions We Built

Note: We wrote a few helper functions in the last few lectures and labs, which are now in a module called sequenceTools:

  • isVowel()

  • vowelSeq()

  • countVowels()

  • wordStartEnd()

  • palindromes()

We can import the functions from our module using the import command.

from sequenceTools import *

Student Fun Facts

Let’s create some “student fun facts” using your names! :-)

students = [] # initialize empty list
filename = "csv/classNames.csv"
with open(filename) as roster: 
    for line in roster:
        fullName = line.strip().split(',')
        firstName = fullName[1]
        lastName = fullName[0]
        # print(firstName,lastName)
        students.append(firstName + ' ' + lastName)

Which student names start with a vowel?

vowelNames = []
for name in students:
    if isVowel(name[0]):
        vowelNames.append(name)

        
vowelNames
['Emir C. Atli',
 'Anjali K. Bhatia',
 'Alex W. Choi',
 'Ethan Cooper',
 'Edith N. Edwards-Mizel',
 'Amir H. Estejab',
 'Arden N. Fluehr',
 'Arielle T. Levy',
 'Emma C. Miller',
 'Alyse Sayed',
 'Aniya J. Smith',
 'Ella G. Sukup',
 'Olivia V. Winters',
 'Addison Zou']

More Fun Facts! Which students have long or short names?

longName = []
shortName = []
for name in students:
    firstName = name.split()[0]
    if len(firstName) > 8:
        longName.append(name)
    elif len(firstName) < 4:
        shortName.append(name)

print("Long names:", longName)
print("Short names:", shortName)
Long names: ['Genevieve B. Randazzo', 'Maximilian A. Shah']
Short names: ['RJ Acosta', 'M Aditta Arian', 'Min Kyu Park', 'Sam Samuel']

Class Exercises

  1. Give our list of lists of strings shown above, write a function that returns a list of all students’ last names?

def lastNames(rosterList):
    """Takes the student info as a list of strings and returns just
    a list of last names"""
    pass
    
  1. Write a function characterList which takes in two arguments rosterList (list of strings) and character (a string) and returns the list of students in the class whose name starts with character.

def characterList(rosterList, character):
    """Takes the student info as a list of strings and a string character
    and returns a list of students whose name starts with character"""
    pass
characterList(allStudents, "B") 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 characterList(allStudents, "B")

NameError: name 'allStudents' is not defined
  1. Write a function that computes the student with the most vowels in their last name. (Hint: use countVowels().)

def mostVowels(rosterList):
    """Takes the student info as a list of strings
    and returns the student name with the most vowels in their last name"""
    pass
mostVowels(allStudents)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 mostVowels(allStudents)

NameError: name 'allStudents' is not defined