Syntax Tree Organization

Syntax Tree Organization

As discussed in class, there is a significant difference between the internal nodes of a syntax tree and its leaves. Within an internal node, one finds a phrase type and pointers to sub-trees. The leaves, on the other hand, hold information about identifiers and constants. In fact, in class I have suggested that rather than actually having separate nodes for the leaves, one could use symbol table entries for leaf nodes.

We will not actually do this in the compilers you build. The reason is a simple, practical one. To generate good error messages, one needs to keep information about where in the source program the text that corresponds to each sub-tree of the syntax tree can be found. We will do this by storing in each node the line number on which the first token that belonged to the phrase the node represents was found. This can not be done for identifiers if all occurrences of an identifier are represented by a single symbol table entry. So, we instead represent identifiers by nodes that contain the line number on which they were found and a pointer to the appropriate symbol table entry. Similar nodes will be used for constants.

As part of semantic processing, you will rewrite the trees that the parser produces for variable references. Basically, while the parser creates trees based on the syntactic structure of the source code, the code generator would prefer trees corresponding closely to the capabilities of the underlying hardware. Variable references, particularly subscripted variables and component selections, can be reconstructed by the semantic processing routines so that they explicitly describe much of the addressing arithmetic required by the variable references they represent.

Two special node types are used to support this translation of variable reference subtrees. The first is an internal node type used to represent the root of a variable reference subtree. These nodes will each hold a single pointer to the subtree that describes the variable reference. The other is a node type used to represent references to pointers to function activation records. Such nodes do not appear in the trees produced by the parser but are needed to translate variable reference subtrees into a form that explicitly describes the required address arithmetic. These nodes will always appear as leaves in the tree.

Computer Science 434
Department of Computer Science
Williams College

Syntax Tree Organization