CSCI 237 :: Computer Organization

CSCI 237

Computer Organization

Lab 5: Understanding Cache Memories

Assigned	Apr 16/17, 2025
Prelim Due Date	Apr 22/23, 2025 at 11:00pm. You should have the basic infrastructure for your code in place (reading from trace file, parsing command line arguments, etc) for this preliminary submission.
Final Due Date	Apr 29/30, 2025 at 11:00pm. Final version of `csim.c`.
Files	lab5.tar [Slides]
Submissions	Submit your solutions using `submit237 5 csim.c`. If you work with a partner, only one submission per pair is required.

Overview

This lab will help you understand the operation of cache memories and the impact they can have on the performance of your C programs. In this lab you will write a small C program (about 200-300 lines) that simulates the behavior of a cache memory.

You may work with a partner on this lab. Groups of three or more are not permitted. If you choose to work with a partner, make sure that both members of the group are contributing/typing equally.

Instructions

To fetch the source files for this lab, right click on lab5.tar and choose "Save As" to download the tarball to your local directory. You may want to move this file to a more desirable location using the Unix mv command. Alternatively, if you are using SSH to work remotely, you may want to use wget to fetch the file. Once you have used cd to navigate to the desired directory on a lab machine, you can use the following command to fetch the tarball:

         $ wget http://dept.cs.williams.edu/~jeannie/cs237/labs/lab5/lab5.tar

Extract the contents using the following command:

         $ tar xvf lab5.tar

This will create the directory cachelab, which contains a number of files. You will be modifying one file inside this directory: csim.c. To compile these files, type:

         $ cd cachelab
         $ make clean; make

Reference Trace Files

The traces subdirectory contains a collection of reference trace files that you will use to evaluate the correctness of your cache simulator. The trace files are generated by a Linux program called valgrind. For example, typing

         $ valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l

on the command line runs the executable program "ls -l", captures a trace of each of its memory accesses in the order they occur, and prints them on stdout.

Valgrind memory traces have the following form:

         I  0400d7d4,8                                                                                                                        
          M 0421c7f0,4                                                                                                                       
          L 04f6b868,8                                                                                                                       
          S 7ff0005c8,8

Each line denotes one or two memory accesses. The format of each line is

         [space]operation address,size

The operation field denotes the type of memory access: "I" denotes an instruction load, "L" a data load, "S" a data store, and "M" a data modify (i.e., a data load followed by a data store). There is never a space before an "I". There is always a space before an "M", "L", and "S". The address field specifies a 64-bit hexadecimal memory address. The size field specifies the number of bytes accessed by the operation.

Writing a Cache Simulator

In this lab you will write a cache simulator in csim.c that takes a valgrind memory trace as input, simulates the hit/miss behavior of a cache memory on this trace, and outputs the total number of hits, misses, and evictions.

I have provided you with the binary executable of a reference cache simulator, called csim-ref, that simulates the behavior of a cache with arbitrary size and associativity on a valgrind trace file. It uses the LRU (least-recently used) replacement policy when choosing which cache line to evict.

The reference simulator takes the following command-line arguments:

         Usage: ./csim-ref [-hv] -s <s> -E <E> -b <b> -t <tracefile>

-h: Optional help flag that prints usage info
-v: Optional verbose flag that displays trace info
-s <s>: Number of set index bits (S = 2^s is the number of sets)
-E <E>: Associativity (number of lines per set)
-b <b>: Number of block bits (B = 2^b is the block size)
-t <tracefile>: Name of the valgrind trace to replay

The command-line arguments are based on the notation (s, E, and b) from CSAPP (see page 617). Figure 6.25 on page 616 is also helpful. For example:

         $ ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace
         hits:4 misses:5 evictions:3

The same example in verbose mode:

         $ ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace
         L 10,1 miss
         M 20,1 miss hit
         L 22,1 hit
         S 18,1 hit
         L 110,1 miss eviction
         L 210,1 miss eviction
         M 12,1 miss eviction hit
         hits:4 misses:5 evictions:3

Your job is to fill in the csim.c file so that it takes the same command line arguments and produces the identical output as the reference simulator. There is some starter code included in cachelab.c, cachelab.h, and csim.c that you should review before writing your own code. (Note: You must be on campus or on the VPN to see the code.)

Programming Rules

Include your name(s) in the header comment for csim.c.
Your csim.c file must compile without warnings in order to receive credit. You can compile with make.
Your simulator must work correctly for arbitrary s, E, and b. This means that you will need to allocate storage for your simulator's data structures using the malloc function. Type man malloc for information about this function.
For this lab, I am interested only in data cache performance, so your simulator should ignore all instruction cache accesses (lines starting with "I"). Recall that valgrind always puts "I" in the first column (with no preceding space), and "M", "L", and "S" in the second column (with a preceding space). This may help you parse the trace.
To receive credit for this lab, you must call the function printSummary, with the total number of hits, misses, and evictions, at the end of your main function:
```
printSummary(hit_count, miss_count, eviction_count);
```
For this this lab, you should assume that memory accesses are aligned properly, such that a single memory access never crosses block boundaries. By making this assumption, you can ignore the request sizes in the valgrind traces.

Evaluation

I will run your cache simulator using different cache parameters and traces. There are eight test cases, each worth 3 points, except for the last case, which is worth 6 points:

$ ./csim -s 1 -E 1 -b 1 -t traces/yi2.trace $ ./csim -s 4 -E 2 -b 4 -t traces/yi.trace $ ./csim -s 2 -E 1 -b 4 -t traces/dave.trace $ ./csim -s 2 -E 1 -b 3 -t traces/trans.trace $ ./csim -s 2 -E 2 -b 3 -t traces/trans.trace $ ./csim -s 2 -E 4 -b 3 -t traces/trans.trace $ ./csim -s 5 -E 1 -b 5 -t traces/trans.trace $ ./csim -s 5 -E 1 -b 5 -t traces/long.trace

You can use the reference simulator csim-ref to obtain the correct answer for each of these test cases. During debugging, use the -v option for a detailed record of each hit and miss.

For each test case, outputting the correct number of cache hits, misses and evictions will give you full credit for that test case. Each of your reported number of hits, misses and evictions is worth 1/3 of the credit for that test case. That is, if a particular test case is worth 3 points, and your simulator outputs the correct number of hits and misses, but reports the wrong number of evictions, then you will earn 2 points.

Hints and Details

I have provided you with a simplified autograding program, called test-csim, that tests the correctness of your cache simulator on the reference traces. Be sure to compile your simulator before running the test:


         $ make
         $ ./test-csim
    
         Your simulator Reference simulator
         Points (s,E,b) Hits Misses Evicts Hits Misses Evicts
              3 (1,1,1) 9 8 6 9 8 6 traces/yi2.trace
              3 (4,2,4) 4 5 2 4 5 2 traces/yi.trace
              3 (2,1,4) 2 3 1 2 3 1 traces/dave.trace
              3 (2,1,3) 167 71 67 167 71 67 traces/trans.trace
              3 (2,2,3) 201 37 29 201 37 29 traces/trans.trace
              3 (2,4,3) 212 26 10 212 26 10 traces/trans.trace
              3 (5,1,5) 231 7 0 231 7 0 traces/trans.trace
              6 (5,1,5) 265189 21775 21743 265189 21775 21743 traces/long.trace
             27

For each test, it shows the number of points you earned, the cache parameters, the input trace file, and a comparison of the results from your simulator and the reference simulator.

Getting Started

Here are some suggestions for getting started:

Do your initial debugging on the small traces, such as traces/dave.trace.
The reference simulator takes an optional -v argument that enables verbose output, displaying the hits, misses, and evictions that occur as a result of each memory access. You are not required to implement this feature in your csim.c code, but I strongly recommend that you do so. It will help you debug by allowing you to directly compare the behavior of your simulator with the reference simulator on the reference trace files.
The starter code uses the getopt function to parse your command line arguments. Use "man getopt" for details if you are curious.
Each data load (L) or store (S) operation can cause at most one cache miss. The data modify operation (M) is treated as a load followed by a store to the same address. Thus, an M operation can result in two cache hits, or a miss and a hit plus a possible eviction.

Getting Started (with GitLab)

If you are working with a partner, I encourage you to use a version control system (like GitHub or GitLab). To use the departmental GitLab server, you should have received an email from Lida regarding your 237 repository on evolene. Assuming this is true, you should be able to:

Create an empty repo and add your partner as a maintainer.
Clone your new repo using git clone. Use the HTTPS URL from evolene.
Add files to your (initially empty) git repo using git add.
Commit and push files to the server using git commit and git push.

Recall that you must be on campus or using the VPN to access evolene. If you get a certificate error when trying to git clone from the Linux lab machines, try this:

        git config --global http.sslCAInfo /home/cs-local-linux/certs/cacert.pem

and then try to git clone again. If you need help using git or adding your group members to your repository, let me know. This cheat sheet may also be helpful.

If you are trying to ssh into one of our machines and getting an error that says "REMOTE HOST IDENTIFICATION HAS CHANGED", try the following command:

        ssh-keygen -R {machine name}

You should only need to do this once per machine.

Resources

Very old notes from CMU (but still relevant!)
Notes from C Bootcamp at CMU