JACCL Compiler Implementation Project
Phase 3.2: Optimization Techniques -- Value Numbering
Due: May 7, 1999
I would like you to implement one additional optimization technique in your compilers -- elimination of common subexpressions within basic blocks. Before you can eliminate common subexpressions, however, you must first identify them. That will be the goal of this phase.
To accomplish this you should implement the "value numbering" scheme I described in class. For each basic block, you will traverse the statement and expression subtrees. As you do this, you will build a hash table mapping keys that describe expressions using the value numbers of their subexpressions to an operand descriptor preallocated for use with all instances of the expression found. These operand descriptors will contain several fields beyond those used in previous phases. In particular each operand descriptor will contain the value number used to identify the group of common subexpressions with which it is associated, a count of the number of instance of the expression that will be evaluated during execution of the containing basic block and a pointer to the root of the first instance found. The .h files in the "phase3" subdirectory of my "pub/434" directory contain definitions for this extended operand descriptor type.
For each expression, you will first process all subexpressions to determine their value numbers. Then, you will build the key describing the current expression and look it up in the hash table. If a match is found, you will store a pointer to the associated operand descriptor in the expression's root node in the syntax tree and return the value number found in the operand descriptor as the identifier for the expression. If no match is found, you must allocate a new operand descriptor and assign it a new value number. Also, since you will clearly need the values of this expression's sub-expressions when this expression is needed at run-time, you should increment the reference counts stored in the operand descriptors of the sub-expressions. Then you can proceed as if a match had been found.
Much of the tricky work comes in the handling of variables. The variable declaration descriptor type will contain an extra field for storing a variable's current value number. You will have to assign a new value number to a variable the first time it is referenced in a block and any time a new value is assigned to it. You should keep a list of all variables to which you have assigned value numbers so that they can be easily reset to the "no value number assigned" state at the end of a block.
In your handling of assignments, you must address the problem of aliases. If the target of an assignment is a var parameter, you must clear any value numbers you have assigned to non-local variables or other var parameters. Similar precautions must be taken for assignments made to non-local variables, array elements and record components. Finally, if a call to a function or procedure is made, value numbers associated with non-local variables should be cleared.
At the end of each block processed, you will also need to clear the hash table used to find the operand descriptors associated with expressions. As you do this, you will have to "pop" each operand descriptor created while processing the block off some hash bucket chain. To give you some check on the correctness of your code, you should print the relevant information in these descriptors. In particular, print the reference count and print the tree pointed to as the "first instance" of the common subexpression.