Lab 6 Pre-Lab Exercise: Performance Profiling

Objective

Measure and validate the efficiency and scalability of your implementation on large graphs.

Overview
Project Set Up
Profiling and Performance Tuning

Overview

Before beginning the SocialNetworks lab, I’d like to you use a profiler to identify and remove any performance bottlenecks in your Graph ADT. You can do this using the profiler in XCode or the perf Linux profiler, as I demonstrated in the Performance Lecture.

Project Set Up

Here are a few specific instructions to follow, depending on how you’re working, to ensure you’re able to complete this exercise:

If you are using XCode: The MutableCollections.swift file is in your repository, but I forgot to include it in the XCode project. To add it, select the GraphADT/Sources/GraphADT folder in the Project navigator. Then select “File -> Add Files to GraphADT” from the menubar, and add MutableCollections.swift. It should then appear alongside GraphADT.swift.
If you are using the Linux VM: Please run the command
```
$ sudo apt-get install linux-tools-4.15.0-45-generic
```
to install proper version of perf. You will use the same password you used to log in to gain sudo-er priviledge. If this results in an error about not getting a lock, it is because the OS is trying to download software updates. This is a pesky nuisance, but waiting a few minutes resolves the issue. If the problem persists, let me know. Once perf is installed, you can use the following commands to record profiling data for the unit tests and then view them. (Ctrl-C will stop the unit tests early if they are taking a long time.)
```
$ sudo perf record --call-graph fp .build/x86_64-unknown-linux/debug/GraphADTPackageTests.xctest
$ sudo perf report --call-graph
```
Also, as I showed in the lecture, you can demangle a Swift method name with swift-demangle <...>, as in
```
$ swift-demangle S8GraphADT0A0C7addEdge4from2to8labelledySS_S2StF`
```
If you are ssh’ing to our lab machines: Get in touch with me. Since you cannot use sudo on our lab machines, we’ll do it a different way, but I’ll have to work with you to set it up.

Profiling and Performance Tuning

PerformanceTests.swift contains unit tests to create large graphs with characteristics similar to the Marvel Comics data set from this week’s lab. You should copy the contents of this file into the GraphADT/Tests/OtherTests.swift file. Follow the techniques from the profiling discussion to profile your code in the Instruments Time Profiler, identify any performance bottlenecks in your Graph ADT implementation, and fix them. Take notes as you make measurements and modify your Graph – you’ll need them to answer the first question below.

There are three tests that build “small”, “medium”, and “large” graphs. Ideally, you should be able to complete the testPerformanceOnLargeGraph(), which creates a 10,000 node graph with about 630,000 edges, in a couple of seconds, and hopefully no more than about 10-15 seconds. If your program takes an excessively long time to construct the graph, first make go back and verify it properly handles a very small dataset and passes all of your unit tests. If it does, here are a few questions to consider:

What data structures are you using in your graph? What is their "big-O" runtime? Are there others that are better suited to the purpose?
What is the "big-O" runtime of your checkRep method? If your Graph ADT has a particularly thorough or expensive checkRep function it is possible that it could become a performance bottleneck. However, you don’t want to simply remove those tests. Instead, a good checkRep function should include some way to disable particularly expensive checks by setting a compile-time or run-time variable. You are free to disable the expensive checkRep tests in the version you submit if that makes a large difference in performance. Be sure to make it easy to turn the checks on and off though, because you will want them on most of the time. (The slides on Testing/Debugging have an example of how to do this.)
Also, keep in mind that sets, arrays, and dictionaries are values that may be copied whenever you modify them or return them from a function. Does your graph add or remove values from large sets/arrays, or create many temporary arrays? You may be spending a lot of time making copies of those structures, and you may find it useful to use mutable versions of Set, Dictionary, and Array in place of Swift’s “copy-on-write” versions. I have provided basic implementations of such classes in the MutableCollections.swift already present in your GraphADT/Source directory. They should support most of the common Collection and Sequence operations you’d expect. Their documentation is here:

You are free to copy and use them however you like but keep in mind the tradeoffs between mutable and immutable structures!

If you read the code for those ADTs, you may notice that my implementations of those mutable collections use the standard Swift sets, arrays, and dictionaries internally. So, why is this any better than just using the standard collections directly??? The Swift compiler tries very hard to identify places where a structure can be safely updated without making a copy. Keeping the structures internal to simple wrapper classes with no rep exposure make it easier for the compiler to recognize that there is no need to keep the old value of a structure around after it is updated. Of course, there is no guarantee that the compiler will figure out all the places copying can be eliminated. So… If you can make remove structure-related bottlenecks, let me know and we can look at it together.

Question

Create a section titled “Performance Tuning” at the bottom of your GraphADT/README.md.

As you perform your performance tuning, you may need to modify the implementation and perhaps the public interface of your Graph ADT. Briefly document any changes you made and why in GraphADT/README.md ( with no more than a couple sentences per change).

When documenting a modification you made for performance reasons, you should always include the name of the workload you are using and the timing measurements before and after making the change. If you have profiling data, it is also worth including what you learned from that. You can even use a somewhat stylized template like the following:
```
### First Iteration

* Timing on testPerformanceOnLargeGraph(): ...
* Profile revealed: ...
* From this we concluded: ...
* Changed: ...
* Post-change timing on testPerformanceOnLargeGraph(): ...
```
If you made no changes, state that explicitly.

You don’t need to track and document cosmetic and other minor changes, such as renaming a variable; we are interested in substantial changes to your API or implementation, such as adding a public method or using a different data structure. Describe logical changes rather than precisely how each line of your code was altered. For example, “I switched the data structure for storing all the nodes from a ___ to a ___ because ___" is more helpful than, “I changed line 27 from nodes = ___(); to nodes = ____();.”

Lab 6 Pre-Lab Exercise: Performance Profiling

Objective

Table Of Contents

Overview

Project Set Up

Profiling and Performance Tuning