CS 334: Principles of Programming Languages

CS 334: Lab 9: Concurrency

Overview
Partner
Getting Started
- Setting Up Your Repository
Programming
Submitting Your Work

Overview

In this lab, you will explore concurrent programming through three problems: parallelizing a computation with Java threads, implementing a concurrent Sieve of Eratosthenes using Java's bounded buffers, and rewriting the sieve using Scala actors.

Partner

You are encouraged to work with a partner on this lab.

Getting Started

Setting Up Your Repository

You will receive an email with an invitation link to the lab9 assignment on GitHub Classroom. You can follow the same instructions as on Lab 2 for accessing and cloning your repository. See the GitHub reference for instructions to add a partner. You should answer the following in the appropriate files in your repository.

Programming

No autograder this week

Since these programs don't lend themselves to that type of testing done by an autograder, we won't be running an autograder on GradeScope this week.

1. Concurrency Basics and Performance

Your answers to the questions below should all appear in the P1/README.txt file from the starter.

How Many Cores Do You Have? Determine how many cores your computer has. You can do this in many ways, including programmatically. Here's some code for Java:

public class Cores {
    public static void main(String args[]) {
        int cores = Runtime.getRuntime().availableProcessors();
        System.out.println(cores);
    }
}

And Python:

import multiprocessing
multiprocessing.cpu_count()

Most Divisors. Now we'll work on parallelizing a program to find the positive integer less than or equal to n that has the largest number of integer divisors. It is possible that several integers in the range 1 to n have the same maximum number of divisors, in which case reporting any of them is fine. Here are a few examples of how many divisors numbers have:

numDivisors(12) == 6     // { 1, 2, 3, 4, 6, 12 }
 numDivisors(6) == 4     // { 1, 2, 3, 6 }
 numDivisors(5) == 2     // { 1, 5 }
 numDivisors(4) == 3     // { 1, 2, 4 }

Finding the number of divisors for an integer value is just a matter of finding all integers less than or equal to value that divide it evenly, as shown in the numDivisors method below. That code also includes a main program that takes a number on the command line and computes the value we are looking for.

public class MostDivisors {

  // Count the number of integer divisors of value,
  // including 1 and the value itself.
  public static int numDivisors(int value) {
    int divisorCount = 0;    
    for (int i = 1; i <= value; i++) { 
      if (value % i == 0) {
        divisorCount++;
      }
    }
    return divisorCount;
  }

  public static void main(String[] args) {

    long initialTime = System.currentTimeMillis();

    int n = Integer.parseInt(args[0]);

    // Take 1 as the first "best" number
    int maxDivisors = 1;  
    int numWithMaxDivisors = 1;

    for (int i = 2;  i <= n;  i++) {
      int divisorCount = numDivisors(i);
      if (divisorCount > maxDivisors) {
        maxDivisors = divisorCount;
        numWithMaxDivisors = i;
      }
    }

    long endTime = System.currentTimeMillis();
    long elapseTime = endTime - initialTime;

    System.out.println("The maximum number of divisors is " + maxDivisors);
    System.out.printf("A number with that many divisors is %d\n", 
                      numWithMaxDivisors);
    System.out.println("Computing that took " + elapseTime + " milliseconds");
  }
}

Run this code and see how long it takes to compute the answer for a few reasonably-sized $n$ , such as 10,000 or 50,000. The following is my output for 50,000:

> java MostDivisors 50000
The maximum number of divisors is 100
A number with that many divisors is 45360
Computing that took 2.604 seconds

Find an $n$ that takes 5 to 10 seconds to run. That will be a good test value for your parallel versions... Report this number in the README.txt file in the starter directory.

A Parallel Version. Now, create a new class MostDivisorsThreads that computes the same information, but with a user-specified number of threads. You'll want to structure it similar to the SumExample class from lecture that divided the input space into $t$ blocks, with each of $t$ threads processing one block. In this case, each block of work corresponds to a range of numbers to examine. For example, if $n$ were 1,000 and the $t$ were 4, your code should divide the range [1, 1000], into subranges [1,250], [251,500], [501,750], and [751,1000].

Here is a skeleton to get you started. You'll need to add the elided bits, and possibly a few other items to these classes. As a first step, note that each worker thread will probably take a low and high number delineating the range it should examine.
```
class MostDivisorsWorker extends Thread {

  public MostDivisorsWorker(/* ... */) {
    // ...
  }

  // Count the number of integer divisors of value,
  // including 1 and the value itself.
  private int numDivisors(int value) {
    // as above ...
  }

  public void run() { 
  }

  public int getMostDivisors() { 
    //... 
  }

  public int getNumberWithMostDivisors() { 
    //... 
  }
}

public class MostDivisorsThreads {
  public static void main(String[] args) throws Exception {
    long initialTime = System.currentTimeMillis();

    int n = Integer.parseInt(args[0]);
    int numThreads = Integer.parserInt(args[1]);

    // Take 1 as the first "best" number
    int maxDivisors = 1;  
    int numWithMaxDivisors = 1;

    MostDivisorsThreads[] threads = new MostDivisorsThreads[numThreads];
    for (int i = 0; i < numThreads; i++) {
      threads[i] = new MostDivisorsThreads(/* ... */);
    }

    long endTime = System.currentTimeMillis();
    long elapseTime = endTime - initialTime;

    System.out.printf("The maximum number of divisors is %d\n", maxDivisors);
    System.out.printf("A number with that many divisors is %d\n", 
                      numWithMaxDivisors);
    System.out.printf("Computing that took %3.4g seconds\n", elapseTime / 1000.0);
  }
}
```
You'll now provide two command line args when you run your code:
```
> java MostDivisorsThreads 50000 4
The maximum number of divisors is 100
A number with that many divisors is 45360
Computing that took ???? seconds
```
Try both with a few different numbers of threads, at least 1, 2, 4, the number of cores, twice the number of cores, four times the number of cores and fill in the following table. Speedup for $t$ threads is computed as (running time for 1 thread) / (running time for $t$ threads). For example, if the program takes 20 seconds with 1 thread and 16 seconds with 2 threads, the speedup is 20/16 = 1.25x.

Report the data and speedup in the README.txt file in the starter directory.

A few notes on performance experiments:
- The data is never as clean as you'd like.
- Typically, an experiment like this should be repeated many times to reduce the timing variance due to external effects, noise, imprecision, etc. However, for us, we can just run once or twice to get ballbark statistics. If you see high variance, you can just average 3 runs, or increase $n$ a bit to reduce the impact of noise on the result.
- You may converge on speedups equal to less than 1/2 the number of cores your system reported that it has. One reason is addressed in the next part of the lab. The other is that you are likely on using an Intel processor, and most Intel chips use hyper-threading, which enables a single core to run steps of two threads simultaneously, albeit with certain limitations. Those "hyper-threads" are included in the core count, but they don't help very much in this case because the operations performed by our worker threads are generally not the types of operations that can exploit a hyper-threaded architecture effectively.
  
  Threads Time (s) Speedup
  
  1 --
  
  2
  
  4
  
  ...
Striping vs. Blocking. You probably saw some speedup, but we can do better by dividing the work among threads in a different way...

Using the example from above, if $n$ were 1,000 and the number of threads were 4, your code divides the range [1, 999], into subranges [1,250], [251,500], [501,750], and [751,1000]. However, note that the time to compute numDivisors(i) is O(i). This means that the time to compute the last answer for the range [751,1000] will be much longer than the time to compute the answer for the other ranges since the cost of computing the number of divisors for each value in that range is larger that the cost of computing the number of divisors for any value in any of the other ranges. Thus, your code will end up with some threads finishing much faster than others. The time spent waiting for that last thread to finish is wasted time.

A different approach is to recognize that large numbers take more time to process and to distribute those large numbers evenly across the workers. To do this, instead of creating large blocks of contiguous numbers for each thread to process, stripe consecutive numbers across different threads. That is, with 4 threads, number i should be processed by thread i % 4, giving us the following division of numbers among our four threads:
```
{ 1, 5, 9, 13, ..., 997 }
{ 2, 6, 10, 14, ..., 998 }
{ 3, 7, 11, 15, ..., 999 }
{ 4, 8, 12, 16, ..., 1000 }
```
Computing the number with the most divisors for each of these sets should take roughly the same amount of time, meaning we will compute the overall solution faster than before.

Make a copy of your MostDivisorsThreads.java code and rename the classes to MostDivisorsStriped and MostDivisorsStripedWorker, and modify your algorithm so each worker is created with a low and high value, and also the step size to get from one value to the next (e.g. 4 in the example above). Be sure to add and commit the new Java file to your repository.

Time your code again and fill in a similar table:

Threads Time (s) Speedup

1 --

2

4

...

How does the speedup compare to the earlier version? Report your data and speedup in the README.txt file in the starter directory.
Lots of Threads. We have some big machines in our machine room:

Machine Memory Cores

limia.cs.williams.edu 64GB 48

lohani.cs.williams.edu 128GB 72

Ssh to one of these machines and try your code our on more threads to see how your code scales to bigger systems. You'll want to pick a larger $n$ if your running times become too short to measure accurately (ie, less than say 0.1 second). A good approach is to try 1,2,4,8,16,32,64 threads. Add these measurements to the README.txt file in the starter directory.

Machine	Memory	Cores
limia.cs.williams.edu	64GB	48
lohani.cs.williams.edu	128GB	72

2. Sieve of Eratosthenes

The ML function, primesto n, given below, can be used to calculate the list of all primes less than or equal to n by using the classic "Sieve of Eratosthenes".

(* 
 * Sieve of Eratosthenes: Remove all multiples of first element from list,  
 * then repeat sieving with the tail of the list. If start with list [2..n] 
 * then will find all primes from 2 up to and including n. 
 *) 
fun sieve [] = [] 
    | sieve (fst::rest) = 
        let fun filter p [] = [] 
            | filter p (h::tail) = if (h mod p) = 0 then filter p tail 
                                                    else h::(filter p tail); 
            val nurest = filter fst rest 
        in 
            fst::(sieve nurest) 
        end; 

(* 
 * Returns list of integers from i to j 
 *) 
fun fromto i j = if j < i then [] else i::(fromto (i+1) j); 

(* 
 * Return list of primes from 2 to n 
 *) 
fun primesto n = sieve(fromto 2 n);

Notice that each time through the sieve we first filter all of the multiples of the first element from the tail of the list, and then perform the sieve on the reduced tail. In ML, one must wait until the entire tail has been fitered before you can start the sieve on the resulting list. However, one could use parallelism to have one process start sieving the result before the entire tail had been completely filtered by the original process.

Here is a good way to think of this concurrent algorithm, which uses the Java Buffer class.

The main program should begin by creating a Buffer object, with perhaps 5 slots. It should then successively insert the numbers from 2 to n into the Buffer, followed by -1 to signal that there are no more input numbers.

After the creation of the Buffer object, but before starting to put the numbers into it, the program should create a Sieve object (using the Sieve class described below) and pass it the Buffer object as a parameter to Sieve's constructor. The Sieve object should then begin running in a separate thread while the main program inserts the numbers in the buffer.

After the Sieve object has been constructed and the Buffer object has been stored in an instance variable, in, its run method should get the first item from in:

If that number is negative then the run method should terminate.
Otherwise, it should print out the number (using System.out.println) and then create a new Buffer object, out. A new Sieve should be created with Buffer out and started running in a new thread. Meanwhile the current Sieve object should start filtering the elements from the in buffer. That is, the run method should successively grab numbers from the in buffer. If the number is divisible by the first number that was obtained from in, it is discarded. Otherwise, it is added to the out buffer. This reading and filtering continues until a negative number is read. When the negative number is read, that number is put into the out buffer and then the run method terminates.

In this way, the program will eventually have created a total of $p + 1$ Sieve objects (all running in separate threads), where $p$ is the number of primes between 2 and n. The instances of Sieve will be working in a pipeline, using the buffers to pass numbers from one Sieve object to the next.

Write this program in Java using the Buffer class from lecture. Each of the buffers used should be able to hold at most 5 items. I have provided Producer and Consumer classes in the starter code --- you may find them useful for reference, but you won't use them directly.

3. Sieve of Actors

Course actor library

Include the Actor.scala library file alongside your code when you compile:

scalac Actor.scala Sieve.scala
scala Sieve

Rewrite the Sieve of Eratosthenes program from above in Scala using Actors (and no BoundedBuffer). In the Java program you wrote, you created a new Thread for every prime number you discovered. This time, you will create a new Sieve actor for each prime Int.

Write the code in the Sieve.scala file in the starter project.

Hints: Your Sieve actor should keep track of the prime it was created with, and it should have a slot that can hold another "follower" Sieve actor that will handle the next prime found. (The mailbox of this other Actor will play the role of the BoundedBuffer from the previous problem in that it will hold the numbers being passed along from one actor to the next.) Each Sieve actor should be able to handle messages of the form Num(n) and Stop.

The operation of each Sieve actor, once started, is similar to the previous problem. If it receives a message of the form Num(n) then it checks if it is divisible by its prime. If so, it is discarded. If not, then if the follower Actor has not yet been created, create it with the number. If not, then send the number to the follower Sieve actor. When the Stop message is sent, pass it on to the follower (if any), and exit.

Use receive inside a while loop, as in the PickANumber, Account, and other examples from class.

Your program should print (in order) all of the primes less than 100. You can print each prime as it is discovered (e.g., when you create the corresponding Sieve actor), but it would be even better to return a list of all of the primes, and then print those out. To do this, you can send the message Stop synchronously with "!?", and when one of your actors recieves this message it replies with the list of all primes it knows about. The "!?" operator returns an object of type Any (equivalent to Java's Object), so you'll have to use match to decode it as a list of Int --- this may result in an "unchecked" warning from the compiler for a fairly subtle Java compatibility reason, but you can ignore that.)

I have provided a sample program ProducerConsumer.scala in the starter code. You don't need to use this directly, but you may wish to refer to this code as a simple example of using Actors.

Submitting Your Work

Submit your code to the GradeScope assignment named, for example, "Lab 1". You can submit in one of two ways:

Upload files: Click "Upload" and select all of your source files, or
Link GitHub: Click "GitHub" and select your repository and branch.

Please do not change the names of the starter files. Also:

If you worked with a partner, only one of each pair needs to submit the code.
Indicate who your partner is when you submit. Specifically, after you upload your files, there will be an "Add Group Member" button on the right of the Gradescope webpage -- click that and add your partner.

Autograding: Gradescope will run an autograder on your code that performs some simple tests. Be sure to look at the autograder output to verify your code works as expected. We will run more extensive tests on your code after the deadline.

Threads	Time (s)	Speedup
1		--
2
4
...

Threads	Time (s)	Speedup
1		--
2
4
...