CS 334: HW 10

Instructions

This homework has three types of problems:

  • Self Check: You are strongly encouraged to think about and work through these questions, but you will not submit answers to them.

  • Problems: You will turn in answers to these questions.

  • Programming: This part involves writing Scala code. You are strongly encouraged to work with a partner on it.

Reading

Problems

1. Race Conditions and Atomicity

The DoubleCounter class defined below has methods incrementBoth and getDifference. Assume that DoubleCounter will be used in multi-threaded applications.

class DoubleCounter { 
  protected int x = 0;
  protected int y = 0; 

  public int getDifference() { 
    return x - y; 
  } 

  public void incrementBoth() { 
    x++; 
    y++; 
  } 
} 

There is a potential data race between incrementBoth and getDifference if getDifference is called between the increment of x and the increment of y. You may assume that x++ and y++ execute atomically (although this is not always guaranteed...).

  1. What are the possible return values of getDifference if there are two threads that each invoke incrementBoth exactly once at the same times as a third thread invokes getDifference?

  2. What are the possible return values of getDifference if there are n threads that each invoke incrementBoth exactly once?

  3. Data races can be prevented by inserting synchronization primitives. One option is to declare

    public synchronized int getDifference() {...} 
    public int incrementBoth() {...}
    

    This will prevent two threads from executing method getDifference at the same time. Is this enough to ensure that getDifference always returns 0? Explain briefly.

  4. Is the following declaration

    public int getDifference() {...} 
    public synchronized int incrementBoth() {...}
    

    sufficient to ensure that getDifference always returns 0? Explain briefly.

  5. What are the possible values of getDifference if the following declarations are used?

    public synchronized int getDifference() {...} 
    public synchronized int incrementBoth() {...}
    

Programming

No autograder this week

Since these programs don't lend themselves to that type of testing done by an autograder, we won't be running an autograder on GradeScope this week.

1. Concurrency Basics and Performance

Your answers to the questions below should all appear in the P1/README.txt file from the starter.

  1. How Many Cores Do You Have? Determine how many cores your computer has. You can do this in many ways, including programmatically. Here's some code for Java:

    public class Cores {
        public static void main(String args[]) {
            int cores = Runtime.getRuntime().availableProcessors();
            System.out.println(cores);
        }
    }
    

    And Python:

    import multiprocessing
    multiprocessing.cpu_count()
    
  2. Most Divisors. You'll find a starter project on evolene with the code snippets below.

    This week, we'll work on parallelizing a program to find the positive integer less than or equal to n that has the largest number of integer divisors. It is possible that several integers in the range 1 to n have the same maximum number of divisors, in which case reporting any of them is fine. Here are a few examples of how many divisors numbers have:

    numDivisors(12) == 6     // { 1, 2, 3, 4, 6, 12 }
     numDivisors(6) == 4     // { 1, 2, 3, 6 }
     numDivisors(5) == 2     // { 1, 5 }
     numDivisors(4) == 3     // { 1, 2, 4 }
    

    Finding the number of divisors for an integer value is just a matter of finding all integers less than or equal to value that divide it evenly, as shown in the numDivisors method below. That code also includes a main program that takes a number on the command line and computes the value we are looking for.

    public class MostDivisors {
    
      // Count the number of integer divisors of value,
      // including 1 and the value itself.
      public static int numDivisors(int value) {
        int divisorCount = 0;    
        for (int i = 1; i <= value; i++) { 
          if (value % i == 0) {
            divisorCount++;
          }
        }
        return divisorCount;
      }
    
      public static void main(String[] args) {
    
        long initialTime = System.currentTimeMillis();
    
        int n = Integer.parseInt(args[0]);
    
        // Take 1 as the first "best" number
        int maxDivisors = 1;  
        int numWithMaxDivisors = 1;
    
        for (int i = 2;  i <= n;  i++) {
          int divisorCount = numDivisors(i);
          if (divisorCount > maxDivisors) {
            maxDivisors = divisorCount;
            numWithMaxDivisors = i;
          }
        }
    
        long endTime = System.currentTimeMillis();
        long elapseTime = endTime - initialTime;
    
        System.out.println("The maximum number of divisors is " + maxDivisors);
        System.out.printf("A number with that many divisors is %d\n", 
                          numWithMaxDivisors);
        System.out.println("Computing that took " + elapseTime + " milliseconds");
      }
    }
    

    Run this code and see how long it takes to compute the answer for a few reasonably-sized n, such as 10,000 or 50,000. The following is my output for 50,000:

    > java MostDivisors 50000
    The maximum number of divisors is 100
    A number with that many divisors is 45360
    Computing that took 2.604 seconds
    

    Find an n that takes 5 to 10 seconds to run. That will be a good test value for your parallel versions... Report this number in the README.txt file in the starter directory.

  3. A Parallel Version. Now, create a new class MostDivisorsThreads that computes the same information, but with a user-specified number of threads. You'll want to structure it similar to the SumExample class from lecture that divided the input space into t blocks, with each of t threads processing one block. In this case, each block of work corresponds to a range of numbers to examine. For example, if n were 1,000 and the t were 4, your code should divide the range [1, 1000], into subranges [1,250], [251,500], [501,750], and [751,1000].

    Here is a skeleton to get you started. You'll need to add the elided bits, and possibly a few other items to these classes. As a first step, note that each worker thread will probably take a low and high number delineating the range it should examine.

    class MostDivisorsWorker extends Thread {
    
      public MostDivisorsWorker(/* ... */) {
        // ...
      }
    
      // Count the number of integer divisors of value,
      // including 1 and the value itself.
      private int numDivisors(int value) {
        // as above ...
      }
    
      public void run() { 
      }
    
      public int getMostDivisors() { 
        //... 
      }
    
      public int getNumberWithMostDivisors() { 
        //... 
      }
    }
    
    public class MostDivisorsThreads {
      public static void main(String[] args) throws Exception {
        long initialTime = System.currentTimeMillis();
    
        int n = Integer.parseInt(args[0]);
        int numThreads = Integer.parserInt(args[1]);
    
        // Take 1 as the first "best" number
        int maxDivisors = 1;  
        int numWithMaxDivisors = 1;
    
        MostDivisorsThreads[] threads = new MostDivisorsThreads[numThreads];
        for (int i = 0; i < numThreads; i++) {
          threads[i] = new MostDivisorsThreads(/* ... */);
        }
    
        long endTime = System.currentTimeMillis();
        long elapseTime = endTime - initialTime;
    
        System.out.printf("The maximum number of divisors is %d\n", maxDivisors);
        System.out.printf("A number with that many divisors is %d\n", 
                          numWithMaxDivisors);
        System.out.printf("Computing that took %3.4g seconds\n", elapseTime / 1000.0);
      }
    }
    

    You'll now provide two command line args when you run your code:

    > java MostDivisorsThreads 50000 4
    The maximum number of divisors is 100
    A number with that many divisors is 45360
    Computing that took ???? seconds
    

    Try both with a few different numbers of threads, at least 1, 2, 4, the number of cores, twice the number of cores, four times the number of cores and fill in the following table. Speedup for t threads is computed as (running time for 1 thread) / (running time for t threads). For example, if the program takes 20 seconds with 1 thread and 16 seconds with 2 threads, the speedup is 20/16 = 1.25x.

    Report the data and speedup in the README.txt file in the starter directory.

    A few notes on performance experiments:

    • The data is never as clean as you'd like.

    • Typically, an experiment like this should be repeated many times to reduce the timing variance due to external effects, noise, imprecision, etc. However, for us, we can just run once or twice to get ballbark statistics. If you see high variance, you can just average 3 runs, or increase n a bit to reduce the impact of noise on the result.

    • You may converge on speedups equal to less than 1/2 the number of cores your system reported that it has. One reason is addressed in the next part of the lab. The other is that you are likely on using an Intel processor, and most Intel chips use hyper-threading, which enables a single core to run steps of two threads simultaneously, albeit with certain limitations. Those "hyper-threads" are included in the core count, but they don't help very much in this case because the operations performed by our worker threads are generally not the types of operations that can exploit a hyper-threaded architecture effectively.

      Threads Time (s) Speedup
      1 --
      2
      4
      ...
  4. Striping vs. Blocking. You probably saw some speedup, but we can do better by dividing the work among threads in a different way...

    Using the example from above, if n were 1,000 and the number of threads were 4, your code divides the range [1, 999], into subranges [1,250], [251,500], [501,750], and [751,1000]. However, note that the time to compute numDivisors(i) is O(i). This means that the time to compute the last answer for the range [751,1000] will be much longer than the time to compute the answer for the other ranges since the cost of computing the number of divisors for each value in that range is larger that the cost of computing the number of divisors for any value in any of the other ranges. Thus, your code will end up with some threads finishing much faster than others. The time spent waiting for that last thread to finish is wasted time.

    A different approach is to recognize that large numbers take more time to process and to distribute those large numbers evenly across the workers. To do this, instead of creating large blocks of contiguous numbers for each thread to process, stripe consecutive numbers across different threads. That is, with 4 threads, number i should be processed by thread i % 4, giving us the following division of numbers among our four threads:

    { 1, 5, 9, 13, ..., 997 }
    { 2, 6, 10, 14, ..., 998 }
    { 3, 7, 11, 15, ..., 999 }
    { 4, 8, 12, 16, ..., 1000 }
    

    Computing the number with the most divisors for each of these sets should take roughly the same amount of time, meaning we will compute the overall solution faster than before.

    Make a copy of your MostDivisorsThreads.java code and rename the classes to MostDivisorsStriped and MostDivisorsStripedWorker, and modify your algorithm so each worker is created with a low and high value, and also the step size to get from one value to the next (e.g. 4 in the example above). Be sure to add and commit the new Java file to your repository.

    Time your code again and fill in a similar table:

    Threads Time (s) Speedup
    1 --
    2
    4
    ...

    How does the speedup compare to the earlier version? Report your data and speedup in the README.txt file in the starter directory.

  5. Lots of Threads. We have some big machines in our machine room:

    Machine Memory Cores
    limia.cs.williams.edu 64GB 48
    lohani.cs.williams.edu 128GB 72

    Ssh to one of these machines and try your code our on more threads to see how your code scales to bigger systems. You'll want to pick a larger n if your running times become too short to measure accurately (ie, less than say 0.1 second). A good approach is to try 1,2,4,8,16,32,64 threads. Add these measurements to the README.txt file in the starter directory.

2. Sieve of Eratosthenes

The ML function, primesto n, given below, can be used to calculate the list of all primes less than or equal to n by using the classic "Sieve of Eratosthenes".

(* 
 * Sieve of Eratosthenes: Remove all multiples of first element from list,  
 * then repeat sieving with the tail of the list. If start with list [2..n] 
 * then will find all primes from 2 up to and including n. 
 *) 
fun sieve [] = [] 
    | sieve (fst::rest) = 
        let fun filter p [] = [] 
            | filter p (h::tail) = if (h mod p) = 0 then filter p tail 
                                                    else h::(filter p tail); 
            val nurest = filter fst rest 
        in 
            fst::(sieve nurest) 
        end; 

(* 
 * Returns list of integers from i to j 
 *) 
fun fromto i j = if j < i then [] else i::(fromto (i+1) j); 

(* 
 * Return list of primes from 2 to n 
 *) 
fun primesto n = sieve(fromto 2 n); 

Notice that each time through the sieve we first filter all of the multiples of the first element from the tail of the list, and then perform the sieve on the reduced tail. In ML, one must wait until the entire tail has been fitered before you can start the sieve on the resulting list. However, one could use parallelism to have one process start sieving the result before the entire tail had been completely filtered by the original process.

Here is a good way to think of this concurrent algorithm, which uses the Java Buffer class.

The main program should begin by creating a Buffer object, with perhaps 5 slots. It should then successively insert the numbers from 2 to n into the Buffer, followed by -1 to signal that there are no more input numbers.

After the creation of the Buffer object, but before starting to put the numbers into it, the program should create a Sieve object (using the Sieve class described below) and pass it the Buffer object as a parameter to Sieve's constructor. The Sieve object should then begin running in a separate thread while the main program inserts the numbers in the buffer.

After the Sieve object has been constructed and the Buffer object has been stored in an instance variable, in, its run method should get the first item from in:

  • If that number is negative then the run method should terminate.

  • Otherwise, it should print out the number (using System.out.println) and then create a new Buffer object, out. A new Sieve should be created with Buffer out and started running in a new thread. Meanwhile the current Sieve object should start filtering the elements from the in buffer. That is, the run method should successively grab numbers from the in buffer. If the number is divisible by the first number that was obtained from in, it is discarded. Otherwise, it is added to the out buffer. This reading and filtering continues until a negative number is read. When the negative number is read, that number is put into the out buffer and then the run method terminates.

In this way, the program will eventually have created a total of p + 1 Sieve objects (all running in separate threads), where p is the number of primes between 2 and n. The instances of Sieve will be working in a pipeline, using the buffers to pass numbers from one Sieve object to the next.

Write this program in Java using the Buffer class from lecture. Each of the buffers used should be able to hold at most 5 items. I have provided Producer and Consumer classes in the starter code --- you may find them useful for reference, but you won't use them directly.

3. Sieve of Actors

Use Scala 2.10.7

The most recent versions of Scala do not support the actor library we'll be using for this question.

To use the version with actors if you are on a lab machine, please run the following two commands in the terminal window before using scalac or scala:

export JAVACMD=/usr/lib/jvm/java-8-openjdk-amd64/bin/java
export PATH=~freund/shared/scala-2.10.7/bin/:$PATH

You'll need to do this each time you open a new terminal window.

If you are using your own computer, you can download the correct version here: http://www.scala-lang.org/download/2.10.7.html.

You can verify you are using the correct version if running "scala -version" prints out:

Scala code runner version 2.10.7 -- Copyright 2002-2017, LAMP/EPFL

Rewrite the Sieve of Eratosthenes program from above in Scala using Actors (and no BoundedBuffer). In the Java program you wrote, you created a new Thread for every prime number you discovered. This time, you will create a new Sieve actor for each prime Int.

Write the code in the Sieve.scala file in the starter project.

Hints: Your Sieve actor should keep track of the prime it was created with, and it should have a slot that can hold another "follower" Sieve actor that will handle the next prime found. (The mailbox of this other Actor will play the role of the BoundedBuffer from the previous problem in that it will hold the numbers being passed along from one actor to the next.) Each Sieve actor should be able to handle messages of the form Num(n) and Stop.

The operation of each Sieve actor, once started, is similar to the previous problem. If it receives a message of the form Num(n) then it checks if it is divisible by its prime. If so, it is discarded. If not, then if the follower Actor has not yet been created, create it with the number. If not, then send the number to the follower Sieve actor. When the Stop message is sent, pass it on to the follower (if any), and exit.

This program works best with a receive and a while loop (like in the PickANumber example) rather than react and loop (as in the Parrot and Account actor examples from class).

Your program should print (in order) all of the primes less than 100. You can print each prime as it is discovered (e.g., when you create the corresponding Sieve actor), but it would be even better to return a list of all of the primes, and then print those out. To do this, you can send the message Stop synchronously with "!?", and when one of your actors recieves this message it replies with the list of all primes it knows about. The "!?" operator returns an object of type Any (equivalent to Java's Object), so you'll have to use match to decode it as a list of Int --- this may result in an "unchecked" warning from the compiler for a fairly subtle Java compatibility reason, but you can ignore that.)

I have provided a sample program ProducerConsumer.scala in the starter code. You don't need to use this directly, but you may wish to refer to this code as a simple example of using Actors.

Submitting Your Work

Submit your homework via GradeScope by the beginning of class on the due date.

Written Problems

Submit your answers to the Gradescope assignment named, for example, "HW 1". It should:

  • be clearly written or typed,
  • include your name and HW number at the top,
  • list any students with whom you discussed the problems, and
  • be a single PDF file, with one problem per page.

You will be asked to resubmit homework not satisfying these requirements. Please select the pages for each question when you submit.

Programming Problems

If this homework includes programming problems, submit your code to the Gradescope assignment named, for example, "HW 1 Programming". Also:

  • Be sure to upload all source files for the programming questions, and please do not change the names of the starter files.
  • If you worked with a partner, only one of each pair needs to submit the code.
  • Indicate who your partner is when you submit the code to gradescope. Specifically, after you upload your files, there will be an "Add Group Member" button on the right of the Gradescope webpage -- click that and add your partner.

Autograding: For most programming questions, Gradescope will run an autograder on your code that performs some simple tests. Be sure to look at the autograder output to verify your code works as expected. We will run more extensive tests on your code after the deadline. If you encounter difficulties or unexpected results from the autograder, please let us know.