This homework has three types of problems:
Self Check: You are strongly encouraged to think about and work through these questions, but you will not submit answers to them.
Problems: You will turn in answers to these questions.
Programming: This part involves writing Scala code. You are strongly encouraged to work with a partner on it.
Read Mitchell, Chapter 14.1 -- 14.2, 14.4 (up through page 461)
Scala Actors Tutorials, as needed. (A good starting point is http://www.scala-lang.org/node/242).
The DoubleCounter
class defined below has methods incrementBoth
and getDifference
. Assume that DoubleCounter
will be used in
multi-threaded applications.
class DoubleCounter {
protected int x = 0;
protected int y = 0;
public int getDifference() {
return x - y;
}
public void incrementBoth() {
x++;
y++;
}
}
There is a potential data race between incrementBoth
and
getDifference
if getDifference
is called between the increment
of x
and the increment of y
. You may assume that x++
and y++
execute atomically (although this is not always guaranteed...).
What are the possible return values of getDifference
if there
are two threads that each invoke incrementBoth
exactly once at
the same times as a third thread invokes getDifference
?
What are the possible return values of getDifference
if there
are n threads that each invoke incrementBoth
exactly once?
Data races can be prevented by inserting synchronization primitives. One option is to declare
public synchronized int getDifference() {...}
public int incrementBoth() {...}
This will prevent two threads from executing method
getDifference
at the same time. Is this enough to ensure that
getDifference
always returns 0? Explain briefly.
Is the following declaration
public int getDifference() {...}
public synchronized int incrementBoth() {...}
sufficient to ensure that getDifference
always returns 0?
Explain briefly.
What are the possible values of getDifference
if the following
declarations are used?
public synchronized int getDifference() {...}
public synchronized int incrementBoth() {...}
No autograder this week
Since these programs don't lend themselves to that type of testing done by an autograder, we won't be running an autograder on GradeScope this week.
Your answers to the questions below should all appear in the P1/README.txt
file from the starter.
How Many Cores Do You Have? Determine how many cores your computer has. You can do this in many ways, including programmatically. Here's some code for Java:
public class Cores {
public static void main(String args[]) {
int cores = Runtime.getRuntime().availableProcessors();
System.out.println(cores);
}
}
And Python:
import multiprocessing
multiprocessing.cpu_count()
Most Divisors. You'll find a starter project on evolene with the code snippets below.
This week, we'll work on parallelizing a program to find the positive integer less than or equal to n
that has the largest number of integer divisors. It is possible that several integers in the range 1 to n
have the same maximum number of divisors, in which case reporting any of them is fine. Here are a few examples of how many divisors numbers have:
numDivisors(12) == 6 // { 1, 2, 3, 4, 6, 12 }
numDivisors(6) == 4 // { 1, 2, 3, 6 }
numDivisors(5) == 2 // { 1, 5 }
numDivisors(4) == 3 // { 1, 2, 4 }
Finding the number of divisors for an integer value
is just a matter of finding all integers less than or equal to value
that divide it evenly, as shown in the numDivisors
method below. That code also includes a main program that takes a number on the command line and computes the value we are looking for.
public class MostDivisors {
// Count the number of integer divisors of value,
// including 1 and the value itself.
public static int numDivisors(int value) {
int divisorCount = 0;
for (int i = 1; i <= value; i++) {
if (value % i == 0) {
divisorCount++;
}
}
return divisorCount;
}
public static void main(String[] args) {
long initialTime = System.currentTimeMillis();
int n = Integer.parseInt(args[0]);
// Take 1 as the first "best" number
int maxDivisors = 1;
int numWithMaxDivisors = 1;
for (int i = 2; i <= n; i++) {
int divisorCount = numDivisors(i);
if (divisorCount > maxDivisors) {
maxDivisors = divisorCount;
numWithMaxDivisors = i;
}
}
long endTime = System.currentTimeMillis();
long elapseTime = endTime - initialTime;
System.out.println("The maximum number of divisors is " + maxDivisors);
System.out.printf("A number with that many divisors is %d\n",
numWithMaxDivisors);
System.out.println("Computing that took " + elapseTime + " milliseconds");
}
}
Run this code and see how long it takes to compute the answer for a few reasonably-sized n, such as 10,000 or 50,000. The following is my output for 50,000:
> java MostDivisors 50000
The maximum number of divisors is 100
A number with that many divisors is 45360
Computing that took 2.604 seconds
Find an n that takes 5 to 10 seconds to run. That will be a good test value for your parallel versions... Report this number in the README.txt
file in the starter directory.
A Parallel Version. Now, create a new class MostDivisorsThreads
that computes the same information, but with a
user-specified number of threads. You'll want to structure it similar to the SumExample
class
from lecture that divided the input space into t blocks, with each of t threads processing one block.
In this case, each block of work corresponds to a range of numbers to examine. For example, if n were 1,000 and the t were 4, your code
should divide the range [1, 1000], into subranges [1,250], [251,500], [501,750], and [751,1000].
Here is a skeleton to get you started. You'll need to add the elided bits, and possibly a few other items to these classes. As a first step, note that each worker thread will probably take a low and high number delineating the range it should examine.
class MostDivisorsWorker extends Thread {
public MostDivisorsWorker(/* ... */) {
// ...
}
// Count the number of integer divisors of value,
// including 1 and the value itself.
private int numDivisors(int value) {
// as above ...
}
public void run() {
}
public int getMostDivisors() {
//...
}
public int getNumberWithMostDivisors() {
//...
}
}
public class MostDivisorsThreads {
public static void main(String[] args) throws Exception {
long initialTime = System.currentTimeMillis();
int n = Integer.parseInt(args[0]);
int numThreads = Integer.parserInt(args[1]);
// Take 1 as the first "best" number
int maxDivisors = 1;
int numWithMaxDivisors = 1;
MostDivisorsThreads[] threads = new MostDivisorsThreads[numThreads];
for (int i = 0; i < numThreads; i++) {
threads[i] = new MostDivisorsThreads(/* ... */);
}
long endTime = System.currentTimeMillis();
long elapseTime = endTime - initialTime;
System.out.printf("The maximum number of divisors is %d\n", maxDivisors);
System.out.printf("A number with that many divisors is %d\n",
numWithMaxDivisors);
System.out.printf("Computing that took %3.4g seconds\n", elapseTime / 1000.0);
}
}
You'll now provide two command line args when you run your code:
> java MostDivisorsThreads 50000 4
The maximum number of divisors is 100
A number with that many divisors is 45360
Computing that took ???? seconds
Try both with a few different numbers of threads, at least 1, 2, 4, the number of cores, twice the number of cores, four times the number of cores and fill in the following table. Speedup for t threads is computed as (running time for 1 thread) / (running time for t threads). For example, if the program takes 20 seconds with 1 thread and 16 seconds with 2 threads, the speedup is 20/16 = 1.25x.
Report the data and speedup in the README.txt
file in the starter directory.
A few notes on performance experiments:
The data is never as clean as you'd like.
Typically, an experiment like this should be repeated many times to reduce the timing variance due to external effects, noise, imprecision, etc. However, for us, we can just run once or twice to get ballbark statistics. If you see high variance, you can just average 3 runs, or increase n a bit to reduce the impact of noise on the result.
You may converge on speedups equal to less than 1/2 the number of cores your system reported that it has. One reason is addressed in the next part of the lab. The other is that you are likely on using an Intel processor, and most Intel chips use hyper-threading, which enables a single core to run steps of two threads simultaneously, albeit with certain limitations. Those "hyper-threads" are included in the core count, but they don't help very much in this case because the operations performed by our worker threads are generally not the types of operations that can exploit a hyper-threaded architecture effectively.
Threads | Time (s) | Speedup |
---|---|---|
1 | -- | |
2 | ||
4 | ||
... |
Striping vs. Blocking. You probably saw some speedup, but we can do better by dividing the work among threads in a different way...
Using the example from above, if n were 1,000 and the number of threads were 4, your code
divides the range [1, 999], into subranges [1,250], [251,500], [501,750], and [751,1000]. However, note that the
time to compute numDivisors(i)
is O(i)
. This means that the time to compute the last answer for the range [751,1000] will be
much longer than the time to compute the answer for the other ranges since the cost of computing the number of divisors for each value in that range is larger that the cost of computing the number of divisors for any value in any of the other ranges. Thus, your code will end up with some threads finishing much faster than others. The time spent waiting for that last thread to finish is wasted time.
A different approach is to recognize that large numbers take more time to process and to distribute those large numbers evenly across the workers.
To do this, instead of creating large blocks of contiguous numbers for each thread to process, stripe
consecutive numbers across different threads.
That is, with 4 threads, number i
should be processed by thread i % 4
, giving us the following division of numbers among our four threads:
{ 1, 5, 9, 13, ..., 997 }
{ 2, 6, 10, 14, ..., 998 }
{ 3, 7, 11, 15, ..., 999 }
{ 4, 8, 12, 16, ..., 1000 }
Computing the number with the most divisors for each of these sets should take roughly the same amount of time, meaning we will compute the overall solution faster than before.
Make a copy of your MostDivisorsThreads.java
code and rename the classes to MostDivisorsStriped
and MostDivisorsStripedWorker
, and modify your algorithm so each worker is created with a low and high value, and also the step size to get from one value to the next (e.g. 4 in the example above). Be sure to add and commit the new Java file to your repository.
Time your code again and fill in a similar table:
Threads | Time (s) | Speedup |
---|---|---|
1 | -- | |
2 | ||
4 | ||
... |
How does the speedup compare to the earlier version? Report your data and speedup in the README.txt
file in the starter directory.
Lots of Threads. We have some big machines in our machine room:
Machine | Memory | Cores |
---|---|---|
limia.cs.williams.edu | 64GB | 48 |
lohani.cs.williams.edu | 128GB | 72 |
Ssh to one of these machines and try your code our on more threads to see how your code scales to bigger systems. You'll want to pick a larger n if your running times become too short to measure accurately (ie, less than say 0.1 second). A good approach is to try 1,2,4,8,16,32,64 threads. Add these measurements to the README.txt
file in the starter directory.
The ML function, primesto n
, given below, can be used to calculate
the list of all primes less than or equal to n
by using the
classic "Sieve of Eratosthenes".
(*
* Sieve of Eratosthenes: Remove all multiples of first element from list,
* then repeat sieving with the tail of the list. If start with list [2..n]
* then will find all primes from 2 up to and including n.
*)
fun sieve [] = []
| sieve (fst::rest) =
let fun filter p [] = []
| filter p (h::tail) = if (h mod p) = 0 then filter p tail
else h::(filter p tail);
val nurest = filter fst rest
in
fst::(sieve nurest)
end;
(*
* Returns list of integers from i to j
*)
fun fromto i j = if j < i then [] else i::(fromto (i+1) j);
(*
* Return list of primes from 2 to n
*)
fun primesto n = sieve(fromto 2 n);
Notice that each time through the sieve we first filter all of the multiples of the first element from the tail of the list, and then perform the sieve on the reduced tail. In ML, one must wait until the entire tail has been fitered before you can start the sieve on the resulting list. However, one could use parallelism to have one process start sieving the result before the entire tail had been completely filtered by the original process.
Here is a good way to think of this concurrent algorithm, which uses
the Java Buffer
class.
The main program should begin by creating a Buffer
object, with
perhaps 5 slots. It should then successively insert the numbers from
2 to n
into the Buffer
, followed by -1 to signal that there are
no more input numbers.
After the creation of the Buffer
object, but before starting to
put the numbers into it, the program should create a Sieve
object
(using the Sieve
class described below) and pass it the Buffer
object as a parameter to Sieve
's constructor. The Sieve
object
should then begin running in a separate thread while the main
program inserts the numbers in the buffer.
After the Sieve
object has been constructed and the Buffer
object has been stored in an instance variable, in
, its run
method should get the first item from in
:
If that number is negative then the run
method should
terminate.
Otherwise, it should print out the number (using
System.out.println
) and then create a new Buffer
object,
out
. A new Sieve
should be created with Buffer
out
and
started running in a new thread. Meanwhile the current Sieve
object should start filtering the elements from the in
buffer.
That is, the run
method should successively grab numbers from
the in
buffer. If the number is divisible by the first number
that was obtained from in
, it is discarded. Otherwise, it is
added to the out
buffer. This reading and filtering continues
until a negative number is read. When the negative number is
read, that number is put into the out
buffer and then the
run
method terminates.
In this way, the program will eventually have created a total of
p +
1 Sieve
objects (all running in separate threads), where p is
the number of primes between 2 and n
. The instances of Sieve
will be working in a pipeline, using the buffers to pass numbers
from one Sieve
object to the next.
Write this program in Java using the Buffer
class from lecture.
Each of the buffers used should be able to hold at most 5 items.
I have provided Producer
and Consumer
classes in the starter code --- you may find them useful for reference, but you won't use them directly.
Use Scala 2.10.7
The most recent versions of Scala do not support the actor library we'll be using for this question.
To use the
version with actors if you are on a lab machine, please
run the following two commands in the terminal window
before using scalac
or scala
:
export JAVACMD=/usr/lib/jvm/java-8-openjdk-amd64/bin/java
export PATH=~freund/shared/scala-2.10.7/bin/:$PATH
You'll need to do this each time you open a new terminal window.
If you are using your own computer, you can download the correct version here: http://www.scala-lang.org/download/2.10.7.html.
You can verify you are using the correct version if running
"scala -version
" prints out:
Scala code runner version 2.10.7 -- Copyright 2002-2017, LAMP/EPFL
Rewrite the Sieve of Eratosthenes program from above in Scala using
Actors (and no BoundedBuffer
). In the Java program you wrote, you
created a new Thread for every prime number you discovered. This
time, you will create a new Sieve
actor for each prime Int
.
Write the code in the Sieve.scala
file in the starter project.
Hints: Your Sieve
actor should keep track of the prime it was
created with, and it should have a slot that can hold another
"follower" Sieve
actor that will handle the next prime found. (The
mailbox of this other Actor will play the role of the
BoundedBuffer
from the previous problem in that it will hold the
numbers being passed along from one actor to the next.) Each Sieve
actor should be able to handle messages of the form Num(n)
and
Stop
.
The operation of each Sieve
actor, once started, is similar to the
previous problem. If it receives a message of the form Num(n)
then
it checks if it is divisible by its prime. If so, it is discarded.
If not, then if the follower Actor has not yet been created, create
it with the number. If not, then send the number to the follower
Sieve
actor. When the Stop message
is sent, pass it on to the
follower (if any), and exit.
This program works best with a receive
and a while
loop (like in
the PickANumber example) rather than react
and loop
(as in the
Parrot
and Account
actor examples from class).
Your program should print (in order) all of the primes less
than 100. You can print each prime as it is discovered (e.g., when
you create the corresponding Sieve
actor), but it would be even
better to return a list of all of the primes, and then print those
out. To do this, you can send the message Stop synchronously with
"!?", and when one of your actors recieves this message it replies
with the list of all primes it knows about. The "!?" operator
returns an object of type Any
(equivalent to Java's Object
), so
you'll have to use match
to decode it as a list of Int
--- this
may result in an "unchecked" warning from the compiler for a fairly
subtle Java compatibility reason, but you can ignore that.)
I have provided a sample program ProducerConsumer.scala
in the
starter code. You don't need to use this directly, but you may wish
to refer to this code as a simple example of using Actors.
Submit your homework via GradeScope by the beginning of class on the due date.
Submit your answers to the Gradescope assignment named, for example, "HW 1". It should:
You will be asked to resubmit homework not satisfying these requirements. Please select the pages for each question when you submit.
If this homework includes programming problems, submit your code to the Gradescope assignment named, for example, "HW 1 Programming". Also:
Autograding: For most programming questions, Gradescope will run an autograder on your code that performs some simple tests. Be sure to look at the autograder output to verify your code works as expected. We will run more extensive tests on your code after the deadline. If you encounter difficulties or unexpected results from the autograder, please let us know.