CSCI 338 :: Parallel Processing

CSCI 338

Parallel Processing

Home | Lectures | Programming Assignments | Links | CS@Williams

Program 1: Concurrent Processes

Assigned	February 11, 2025
Final Due Date	In-class presentation Feb. 18. Code and slides due Feb. 20.

Overview

The purpose of this assignment is twofold: to write a concurrent program using UNIX processes and to start your exploration of one of the graph algorithms you will be working with this term.

How the Honor Code Applies to This Assignment

This is a group-eligible assignemnt. Meaning, you should work with your class-approved team of students to produce your assigments, evaluation, and final reports. You are also encouraged to to ask non-team members questions of clarification, language syntax, and error message interpretation, but are not permitted to view/share each others code or written design notes for part1; you are welcome to share code with one another for part2. Please see the syllabus for more details.

Part One: Concurrent UNIX processes

In this first part of the assignment, you will be creating a C program that uses concurrent UNIX processes to come up with words that start with some combination of a specified set of 3 characters. Think of your program as a way to cheat at Scrabble.

When the work that must be completed involves multiple independent tasks that do not need to share information with one another, independent tasks can be mapped to (sibling) processes that each complete some subset of the total set of work to do. This is one of the most basic forms of concurrency. In many ways, this can be thought of as mapping each iteration of a loop to its own process. The parent process creates these children and waits for them all to complete their tasks.

Your program will first ask the user to enter 3 characters using stdin. It will then generate strings for all 6 possible orderings of those characters. For each string, it will fork and execute a child process, passing one of the 6 strings to each new child process so that all 6 strings get processed (don't worry about doing anything special for duplicate strings in the set of 6); these child processes should run concurrently with each other and the parent process. Each child process will search the file /usr/share/dict/words for all occurrences of its specified string at the beginning a word in the file. Each child will then send those matching words back to the parent process. The parent process waits for all of its children to complete before printing all of the words the child processes found to stdout.

Your program should turn the entered set of characters and the words from /usr/share/dict/words into the same case so that your results are case-insensitive.

The program should be invoked with just the name "cheat". Please supply a Makefile for compiling your program. Feel free to modify the sample Makefile in your gitlab repository if necessary. Note, it's likely you'll create a separate code file for the child process that will need to compiled. If you choose to use a FIFO, you may also need to create a C file that creates the FIFO; please indicate in the usage of cheat if a FIFO needs to be generated and how to do so. Provide a README file that indicates what compilation and execution commands might need to be run other than cheat.

To enable you to focus on the process part of this code, I have supplied you with a beginner helper.c file that contains code for reading in the dictionary file and comparing the beginning of the words in the file to a substring. You are welcome to use this file or create your own code from scratch.

You can obtain the code for this portion of the assignment from gitlab. Please check in all files needed to run your cheat program, including makefiles and other C files (e.g., for creating a fifo or for the child processes) and a README. This portion will count for half of this assignment, with style/document counting for 5 percent of the points and correctness for 95 percent.

Part Two: Profiling Your Graph Algorithms

In this part of the assignment, we'll begin the process of learning about the graph algorithms we're going to use this term. The graph algorithms we will use for this term are from publicly available computer architecture graph algorithm benchmark suites, namely GAP and GMS. These benchmark suites are written in C++, so we're going to start slow and have each group of 1 to 2 people learning about one algorithm and groups of 3 learning about 2 algorithms. Specific algorithms will be assigned to groups during class.

To obtain the code needed for this part of the assignment, you will need to copy the code from my Public/ folder on the MathCS Linux machines into your gitlab part2/ subdirectory. Specifically, you will need to copy, unzip, and untar the following file: ~kshaw/Public/cs338/program1/part2/gapbs-1.5.tar.gz There is also a ~kshaw/Public/cs338/graphs/ directory that you do not want to copy as the files are large.

The goal of this part of the assignment is to get a basic understanding of your assigned algorithm, including what it's doing at a high level, where the code spends most of its time, and what the program control flow looks like for the portion of code where the main computation is being performed. You will likely find these resources helpful:

* "The Gap Benchmark Suite" paper found on the GAP website.
* gprof (GNU Profiler) which has a useful Quick-Start Guide and full manual.

To get started, you'll want to compile the code. You can do this with the make command. This will create a version of the code that uses parallel threads. You can then run the make test command which will verify the correctness of the compilation. For your exploration of the code, I highly recommend using the Makefile.serial file located in the gitlab part2/ directory for this. The default Makefile is written using a parallel thread package that we will talk about soon but will make it harder for you to understand the control flow of the basic program. In Makefile.serial, you can change the CXX-FLAGS. You'll need to change the -O3 flag to -g if you'd like to use gdb. Additionally, you'll need to add the -pg flag (in addition to the -O3 and other existing flags) to use gprof. DO NOT RUN make bench-graphs or make bench-run

To run the code, I encourage you to use two approaches.

* Run the code with generated graphs. You can use the "-gX" or "-uX" flags to do this. To run the internal verifier, you can use "-v"; I recommend just doing this after your first compilation and not worrying about understanding the verification process for this assignment. You can also specify the number of iterations with the "-nX" flag; I recommend just using 1 iteration for your exploration.
* You can run the code with a file using the -f flag. I have supplied graphs for the twitter dataset in my shared directory mentioned above. Specify the path to that directory rather than copying the file to your personal directory as it is large.

Your process should include first running the code with gprof to get a feeling for where time is being spent and the call graph. You should then explore the code itself (including using gdb when appropriate) to get a sense of how the work is organized in the code.

The final product for this part of the assignment is a brief presentation in class and submission of your slides from that presentation as a PDF to gitlab (in the part2/ directory). Your slides should provide an overview of what problem your graph algorithm is solving. They should also provide a call graph where you indicate where most of the time is being spent. For pieces of the code where time is being spent, you'll want to explain what is happening in that code.

This portion of the assignmet will count for half of the points. It will be based on your presentation and slides.

Submitting Your Work

Please submit your code, README, and Makefile for part 1 via gitlab and submit your PDF of your slides for part 2 via gitlab.