CSCI 338
Parallel Processing
Home | Lectures | Programming Assignments | Links | CS@Williams
Program 2: GraphMineSuite Benchmarks
Assigned | February 19, 2025 |
---|---|
Final Due Date | In-class presentation and slides due Feb. 25. |
Overview
The purpose of this assignment is for you to continue your exploration of the graph algorithms you will be working with this term.
Working with the GMS benchmark suite is going to be confusing at first. Please don't hesitate to reach out to me with questions early and often!
How the Honor Code Applies to This Assignment
This is a group-eligible assignemnt. Meaning, you should work with your class-approved team of students to produce your assigments, evaluation, and final reports. You are also encouraged to to ask non-team members questions of clarification, language syntax, and error message interpretation. You are welcome to share code with one another for part2. Please see the syllabus for more details.
Profiling Your Graph Algorithms
Identical to what was done in part2 of program1, we'll become acquainted with two more graph algorithms you're going to use this term. The graph algorithms we will use for this assigment are from the publicly available computer architecture graph algorithm benchmark suites GMS.
To obtain the code needed for this part of the
assignment, you will need to copy the code from
my Public/
folder on the MathCS Linux machines into your
gitlab subdirectory. Specifically, you will need
to copy, unzip, and untar the following
file: ~kshaw/Public/cs338/program2/gms-cs338.tar.gz
The goal of this part of the assignment is to get a basic understanding of your assigned algorithm, including what it's doing at a high level, where the code spends most of its time, and what the program control flow looks like for the portion of code where the main computation is being performed. You will likely find these resources helpful:
* GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.
* GraphMineSuite documentation
* gprof
(GNU Profiler) which has a useful Quick-Start Guide and full manual.
To get started, you'll want to compile the code. The Getting Started page programs documentation on how to compile the code, but I'm going to provide a few particulars here.
* When you compile, use the command flag -DCMAKE_BUILD_TYPE=Debug
so that the -O0 compile flag is used.
* When you run your code, it's going to try to use multiple processors because of the use of openmp. To make it easiest for you to debug, you'll want to set an envionment variable indicating the number of OMP_NUM_THREADS environment variable to 1. Depending on the shell you use, you'll want to use one of these commands.
* When you want to use gprof, you'll need to modify the CMakeLists.txt
file in the top level directory and then rebuild your code. Specifically, you want to add -pg
to the following line:
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -g -O0 -DMINEBENCH_TEST")
. You'll then want to do a "rm -r *" in the build
directory and then recompile the code.
As in program 1, I encourage you to use two approaches to running the code.
* Run the code with generated graphs. You can use the "-gX" or "-uX" flags to do this. To run the internal verifier, you can use "-v"; I recommend just doing this after your first compilation and not worrying about understanding the verification process for this assignment. You can also specify the number of iterations with the "-nX" flag; I recommend just using 1 iteration for your exploration.
* You can run the code with a file using the -f flag. I have supplied graphs for the twitter dataset in my shared directory mentioned above. Specify the path to that directory rather than copying the file to your personal directory as it is large.
Your process should include first running the
code with gprof
to get a feeling for where time is
being spent and the call graph. You should then explore the code
itself (including using gdb
when appropriate) to get
a sense of how the work is organized in the code.
The final product for this assignment is a brief presentation in class and submission of your slides from that presentation as a PDF to gitlab. Your slides should provide an overview of what problem your graph algorithms are solving. They should also provide call graphs where you indicate where most of the time is being spent. For pieces of the code where time is being spent, you'll want to explain what is happening in that code.
Grades will be based on your presentation and slides.
Helpful Details
GMS creates a variety of different versions of some of the algorithms. It provides serial and parallel versions of some algorithms. It also provides different versions of the application where the data structures (i.e., RoaringSet, RobinHood, SortedSet) used to store the graph vary. (Please see the paper for more details.) In some of the applications, it implements a variety of different approaches and runs each of those different versions.
For this foray into the code, explore a serial version if possible. If not, use the parallel version and set the number of threads to 1 to make it easier for you to use the debugger. For the data structures, choose whichever data structure implementation you want to choose. Similarly, when there are a variety of versions of the algorithm implemented, you are welcome to decide which to study. Just make sure to document your choices in your slides.
GMS incorporates and compiles the GAP benchmark applications into its framework. Looking at the GAP application you examined in program1 in the GMS framework might help you understand what additional features GMS adds. The source code for the GAP benchmarks in the GMS code can be found at:
gms/representations/graphs/log_graph
You'll also see that the binaries generated appear at:
gms/build/gapbs/
Notice that GMS generates a variety of versions for each of the GAP bencmarks, using different data structures and graph organizations.
When trying to use gdb, you might run into some wonkiness due to the OMP pragmas. In particular, you might find it hard to step into code inside a block of code that is within a OMP pragma. Never fear. All you need to do is put a breakpoint on a line inside the pragma's block, and you'll be able to walk through the code inside the pragma as usual.
Submitting Your Work
Please submit a PDF of your slides for via gitlab.