An Intermediate Form for C$^o$ Programs -- Representing Syntax Tree Nodes

Representing Syntax Tree Nodes

Representing Syntax Tree Nodes

This leads to a syntax tree with five distinct node types. As a result, to specify the general "type" of a syntax tree node, we use the C union type node described below.¹

      /* Union type that combines the 5 structure types used to describe */
      /* tree nodes */
typedef union nodeunion {
  struct unknode unk;
  struct internalnode internal;
  struct identnode ident;
  struct constnode constant;
  struct displaynode display;
  struct refvarnode var;
} node;

Each of the five node types present in the tree include two common fields: the field specifying the node's phrase type² ( type ) and the field specifying the line of the source code on which the first token of the phrase represented by the node's subtree occurred (line ). The type unknode, whose definition is shown below,

          /* The type 'unknode' provides a template that can be used to */
          /* access the common components found in all node types when  */
          /* the actual type of the node is unknown.                    */
struct unknode {
  nodetype type;
  int line;
};

allows one to reference these fields in situations where the actual type of the node is not yet known. For example, if `root' is a pointer to a node of unknown type one can use the expression:

    root->unk.type

to determine its phrase type. One could also use the expression `root->internal.type' or `root->ident.type', but these expressions mis-leadingly suggest that the type of the node is already known. The type unknode is provided to support clear coding.

The structure type internalnode describes the nodes used to represent almost all of the internal nodes of the tree. In addition to the common

          /* Tree nodes of type 'internal' are used for all nodes that */
          /* are internal to the tree produced by the parser except    */
          /* for the roots of variable reference subtrees.             */
struct internalnode {
  nodetype type;
  int line;
  union nodeunion *child[MAXKIDS]; /* pointers to the node's sub-trees */
};

type and line components found in all nodes, a node of this type includes a component child which is an array of pointers to its children. The number of children of a given node can be determined from its node type. The syntactic analysis routines I will provide conserve memory by only allocating space for the child pointers actually used by a given internal node. Thus, if a node should only have 2 children, its third child pointer should not be used for any purpose.

The structure types identnode and constnode are used to represent the leaves of the syntax trees produced by the parser. Declarations of the structure types are shown below:

         /* Nodes of type 'ident' are used for leaf nodes corresponding */
         /* to identifiers in the source code.  The value in the        */
         /* 'type' component of such a node will always be 'Nident'.    */
struct identnode {
  nodetype type;
  int line;
  identdesc *ident;   /* Pointer to associated identifier descriptor */
  decldesc *decl;     /* Pointer to associated declaration descriptor */
};

         /* Nodes of type 'constant' are used for leaf nodes corresponding */
         /* to constants in the source code.  The value in the 'type'      */
         /* component of such a node will always be 'Nconst'.              */
struct constnode {
  nodetype type;
  int line;
  int value;               /* Integer value of the constant  */
  int ischar;              /* True if this was a character constant */
};

Identifiers are represented by nodes of type identnode. The type component of such nodes will always be Nident. The ident and decl components of an identnode are pointers to the appropriate identifier descriptor and declaration descriptor for the identifier being referenced. The decl components of identnode nodes are set to NULL (the value 0) by the syntactic analyzer. During semantic analysis, the correct values should be stored in these fields.

There is one special group of identnodes produced by the syntactic analyzer. These are identnodes for the keyword integer. Technically, integer is a keyword rather than an identifier in C^o. Treating it as an identifier that has been declared as a type, however, will simplify various parts of the compiler. Accordingly, occurrences of integer will be represented by special identnodes in the syntax tree.

Each constnode contains two fields beyond the common type and line fields. One is named value. It holds the integer value of the constant. The second is a field named ischar which is used as a boolean flag indicating whether the constant found in the source code was a character or an integer. The type component of all such nodes will be Nconst.

There are two additional node types related to subtrees representing references to variables. Their declarations are shown in figures * and *. Tree nodes of type displaynode are used to refer to the address of a function's stack frame.

        /* Nodes of type 'display' are used for leaf nodes corresponding  */
        /* to points where the address of the activation record of a      */
        /* function is needed.  Such nodes are not included in the tree  */
        /* produced by the parser.  They are inserted during semantic     */
        /* processing. */
struct displaynode {
  nodetype type;
  int line;
  int level;          /* The nesting level of the function whose activation record   
                                     address should be used. */
 };

The type displaynode

Other than the standard type and line fields, the only member of a display reference node is a level field use to store the nesting level of the function whose frame address is to be used.

Finally, nodes of type refvarnode are used to designate places where a value should be loaded from a calculated memory address.

        /* Refvar nodes are included by the parser as the roots of all     */
        /* variable reference subtrees.  When created by the parser, the   */
        /* "baseaddr" field will either point to an Nselect, Nsubs or      */
        /* Nident node.  During semantic analysis the "baseaddr" subtree   */
        /* will be converted into a subtree describing the calculation of  */
        /* the memory address for the variable.  A "displacement" field is */
        /* included to hold a constant offset from the base address to the */
        /* variable.  Finally, to preserve information about the symbolic  */
        /* variable being used, the semantic analyzer should set the       */
        /* "vardesc" field to point to the declaration descriptor of the   */
        /* variable being used.     */
struct refvarnode {
  nodetype type;
  int line;
  union nodeunion
      * baseaddr;       /* Subtree describing base address calculation     */
  int displacement;     /* Displacement to variable relative to base addr  */
  decldesc *vardesc;    /* Declaration descriptor for referenced variable  */
};

The refvarnode type

The baseaddr field of a refvarnode points to a subtree that describes the computation of the base address. The value of the displacement field gives a constant value to be added to the base address before accessing memory. This field is initialized to 0 in trees created by the parser. The vardesc field is intended to point to a declaration descriptor for the variable referenced. This field is set to NULL by the parser. As the semantic processor translates variable reference subtrees into a form that more explicitly describes addressing arithmetic, it should set each refvarnode's vardesc field to point to the appropriate declaration descriptor.

Computer Science 434
Department of Computer Science
Williams College

Representing Syntax Tree Nodes