Diving Into the Deluge of Data :: Lab 6 :: Purple America

Lab 6: Purple America

This lab explores visualizing election data by county and state through color and gradient. The data is logicaly split by election results and by county and state regions. A primary goal of this lab is to naturally organize the data using classes and clear, well-defined programming interfaces.


A modern, popular visualization technique for geographic difference is to color counties in the United States according to some statistic. You see it everywhere.

In the last presidential election, we read a lot about red states and blue states. But work by Robert Vanderbei shows that the USA is actually filled with purple states (well, Utah is pretty red and Vermont is pretty blue). We will write code to do these visualizations directly from election data and boundary information in longitudunal and latidunal format.

         

Step 0: Lab Preparation

Step 1: Source Code

Step 2: Data

There are two types of data.

Boundary Data

The boundary data is in CSV format. It has the the form

      COUNTY,STATE,LONG1,LAT1,LONG2,LAT2,...,LONGN,LATN

where each successive pair of LONG,LAT values form a point on the polygon defining a particular region (you should think of longitude as x and latitude as y). Some county/state pairs appear multiple times because they contain separate regions (imagine a county comprised of a series of islands). There are three types of boundary files:

Election Data

The election data is in CSV format. It has the form

    COUNTY,STATE,REPUBLICAN_VOTES,DEMOCRATIC_VOTES,OTHER_VOTES

where COUNTY,STATE matches an appropriate boundary line in a file from above. In fact, if a county / state pair has several boundaries, then the vote counts are repeated. So if bnd is a CSV reader for a boundary file and elc is a CSV reader for an election results file, one can write

      for (co,st,r,d,o), boundary in zip(elc,bnd):

and be assured that the boundary corresponds to the results. In other words, that co == boundary[0] and st == boundary[1].

Naturally, there are three types of election results files:

Your program will take as input, two files—one boundary and one election—that are suitably matched. The data is available from a github repository. To access it, clone the repo into your lab 6 directory, but don't add the files to your lab 6 repo:

      git clone https://github.com/williams-cs/election-data.git  
    
You will have to clone this data again whenever you work on an OIT machine.

Step 3: Design

Your program should be split between three separate files: region.py, plot.py, and election.py. The region.py and plot.py files will hold Region and Plot classes respectively. The election.py script will parse and create instances of Region for each line of the input data and then use an instance of the Plot class to create the visualization.

Step 4: Region

The Region class represents a region (stored as a list of long / lat pairs) and its vote counts. It provides methods to find both the minimum and maximum longitudunal and latitudunal values for the region. These values are used by the Plot class to interpolate the region properly into the image. This class also can determine plurality and voting percentages by party for the region.

Here is the class skeleton. All the methods require definitions except for the initialization routine, which is complete. You should use the lats and longs methods in your min_lat, max_lat, min_long, and max_long methods respectively.

    class Region:
        """
        A region (represented by a list of long/lat coordinates) along with
        republican, democrat, and other vote counts.
        """

        def __init__(self, coords, r_votes, d_votes, o_votes):
            self.coords = coords
            self.r_votes = r_votes
            self.d_votes = d_votes
            self.o_votes = o_votes

        def longs(self):
            "Return a list of the longitudes of all the coordinates in the region"

        def lats(self):
            "Return a list of the latitudes of all the coordinates in the region"

        def min_long(self):
            "Return the minimum longitude of the region"

        def max_long(self):
            "Return the maximum longitude of the region"

        def min_lat(self):
            "Return the minimum latitude of the region"

        def max_lat(self):
            "Return the maximum latitude of the region"

        def plurality(self):
            """return 'REPUBLICAN','DEMOCRAT', or 'OTHER'
                depending on plurality of votes"""

        def total_votes(self):
            "The total number of votes cast in this region"

        def republican_percentage(self):
            "The precentage of republication votes cast in this region"

        def democrat_percentage(self):
            "The precentage of democrat votes cast in this region"

        def other_percentage(self):
            "The precentage of other votes cast in this region"
      

Make sure to test this class out in the Python REPL before proceeding. You can do this by typing

    >>> import region
    >>> r = region.Region([(1,1),(2,2),(4,2),(3,5)], 100, 200, 300)
    >>> r.plurality()
    'OTHER'
    >>> r.r_votes
    100
    >>> r.republican_percentage()
    0.16666666666666666
    >>> r.min_long()
    1
    >>> r.max_lat()
    5
    

Step 5: Plot

The Plot class encapsulates an image proportional in size to a bounding box around a set of regions given in longitudunal and longitudunal coordinates. It also provides the ability to draw regions, appropriately filled, on the image. Besides initialization, it contains two instance methods (draw and save) and five static methods, which appear first in the class definition. The static methods are not instance methods because they don't rely on the current state of the instance, nor do they change any state of the instance—they are logically related to Plot, which is why they live in the Plot namespace.

    from PIL import Image, ImageDraw
    from PIL.ImageColor import getrgb


    class Plot:

        """
        Provides the ability to map, draw and color regions in a long/lat
        bounding box onto a proportionally scaled image.
        """

        @staticmethod
        def interpolate(x_1, x_2, x_3, newlength):
            """linearly interpolates x_2 <= x_1 <= x3 into the range [0, newlength]"""

        @staticmethod
        def proportional_height(new_width, width, height):
            """return a height for new_width that is
               proportional to height with respect to width"""

        @staticmethod
        def fill(region, style):
            """return the fill color for region according to the given 'style'"""
            if style == "GRAD":
                return Plot.gradient(region)
            else:
                return Plot.solid(region)

        @staticmethod
        def solid(region):
            "return an appropriate solid color based on plurarlity of votes"

        @staticmethod
        def gradient(region):
            "return a gradient color based on percentages of votes"

        def __init__(self, width, min_long, min_lat, max_long, max_lat):
            """
            Create a width x height image where height is proportional to width
            with respect to the long/lat coordinates."""

        def save(self, filename):
            """save the current image to 'filename'"""

        def draw(self, region, style):
            """
            Draw 'region' in the given 'style' at the correct position on the
            current image"""
    

This week you will use the Python Image Library directly instead of working with the image wrapper module. Examining the warpper directly should give you some information on how the programming interface works.


    from PIL import Image, ImageDraw, ImageFont

    def create_image(width, height):
        return Image.new("RGB", (width, height), (255, 255, 255))

    def draw_point(image, x, y, color):
        ImageDraw.Draw(image).point((x,y), color)

    def draw_rect(image, xy, fill=None, outline=None):
        ImageDraw.Draw(image).rectangle(xy, fill, outline)

    def save_image(image, filename):
        image.save(filename, "PNG")

    

Besides creating (im = Image.new(...)) and saving (im.save(...)) images, you will use the the ImageDraw.Draw.polygon(xy, fill=None, outline=None) method of the drawing context to draw region boundaries. Notice that polygon is a method of the ImageDraw.Draw class and not the Image class (see the implementation of draw_point and draw_rect above for more examples).

Here are some implementation notes:

Make sure to test your code:

    >>> import region
    >>> import plot
    >>> r = region.Region([(1,1),(2,2),(4,2),(3,5)], 100, 200, 300)
    >>> p = plot.Plot(100,0,0,8,10)
    >>> p.draw(r,"GRAD")
    >>> p.save("example.png")
    
Your should end up with a file called example.png that looks like this.

Step 6: The election.py script

The election.py script parses the input data into Region instances and uses a Plot instance to draw the regions onto an image. Here is the code skeleton.

    import sys
    import csv
    import math
    from region import Region
    from plot import Plot

    def mercator(lat):
        """project latitude 'lat' according to Mercator"""
        lat_rad = (lat * math.pi) / 180
        projection = math.log(math.tan((math.pi / 4) + (lat_rad / 2)))
        return (180 * projection) / math.pi

    def main(results, boundaries, output, width, style):


    if __name__ == '__main__':
        results = sys.argv[1]
        boundaries = sys.argv[2]
        output = sys.argv[3]
        width = int(sys.argv[4])
        style = sys.argv[5]
        main(results, boundaries, output, width, style)

    

The main function takes five arguments:

Some implementation notes:

In summary, you're main function should do the following:

To run your code from the command line use

    $ python3 election.py election-data/results/US2012.csv election-data/boundaries/US.csv output.png 1024 GRAD
    

Step 8: Optional Extensions

Step 7: Submission

Credit

Thanks to Kevin Wayne for his Nifty Assignment.

Thanks to Rich Wicentowski for the nicely formatted data and the optional extension ideas.