Node Phrase TypesSyntax Tree OrganizationRepresenting Syntax Tree Nodes

Representing Syntax Tree Nodes

This leads to a syntax tree with three distinct node types. As a result, to specify the general "type" of a syntax tree node, we use the C union type node described in figure *.1 This figure also contains the definition of the structure types used for the three node types found in syntax trees and a fourth "generic" node type named unknode that can be used to access the common fields of all three node types when dealing with a node whose type is not yet known.

          /* The type 'unknode' provides a template that can be used to */
          /* access the common components found in all node types when  */
          /* the actual type of the node is unknown.                    */
struct unknode {
  nodetype type;
  int line;
};

          /* Tree nodes of type 'internal' are used for all nodes that */
          /* are internal to the tree. */
struct internalnode {
  nodetype type;
  int line;
  union nodeunion *child[MAXKIDS]; /* pointers to the node's sub-trees */
};

         /* Nodes of type 'ident' are used for leaf nodes corresponding */
         /* to identifiers in the source code.  The value in the        */
         /* 'type' component of such a node will always be 'Nident'.    */
struct identnode {
  nodetype type;
  int line;
  identdesc *ident;   /* Pointer to associated identifier descriptor */
  decldesc *decl;     /* Pointer to associated declaration descriptor */
};

         /* Nodes of type 'constant' are used for leaf nodes corresponding */
         /* to constants in the source code.  The value in the 'type'      */
         /* component of such a node will always be 'Nconst'.              */
struct constnode {
  nodetype type;
  int line;
  int value;               /* Integer value of the constant  */
  int ischar;              /* True if this was a character constant */
};

      /* Union type that combines the 4 structure types described above */
typedef union nodeunion {
  struct unknode unk;
  struct internalnode internal;
  struct identnode ident;
  struct constnode constant;
} node;
Definition of `node' and its Sub-types
 

Each of the three node types present in the tree include two common fields: the field specifying the node's phrase type2 ( type ) and the field specifying the line of the source code on which the first token of the phrase represented by the node's subtree occurred (line ). The type unknode allows one to reference these fields in situations where the actual type of the node is not yet known. For example, if `root' is a pointer to a node of unknown type one can use the expression:

    root->unk.type
to determine its phrase type. One could also use the expression `root->internal.type' or `root->ident.type', but these expressions mis-leadingly suggest that the type of the node is already known. The type unknode is provided to support clear coding.

The structure type internalnode describes the nodes used to represent the internal nodes of the tree. In addition to the common type and line components found in all nodes, a node of this type includes a component child which is an array of pointers to its children. The number of children of a given node can be determined from its node type. The syntactic analysis routines I will provide conserve memory by only allocating space for the child pointers actually used by a given internal node. Thus, if a node should only have 2 children, its third child pointer should not be used for any purpose.

The structure types identnode and constnode are used to represent the leaves of the syntax tree. Identifiers are represented by nodes of type identnode. The type component of such nodes will always be Nident. The ident and decl components of an identnode are pointers to the appropriate identifier descriptor and declaration descriptor for the identifier being referenced. The decl components of identnode nodes are set to NULL (the value 0) by the syntactic analyzer. During semantic analysis, the correct values should be stored in these fields.

There is one special group of identnodes produced by the syntactic analyzer. These are identnodes for the keyword integer. Technically, integer is a keyword rather than an identifier in STApL. Treating it as an identifier that has been declared as a type, however, will simplify various parts of the compiler. Accordingly, occurrences of integer will be represented by special identnodes in the syntax tree.

Each constnode contains two fields beyond the common type and line fields. One is named value. It holds the integer value of the constant. The second is a field named ischar which is used as a boolean flag indicating whether the constant found in the source code was a character or an integer. The type component of all such nodes will be Nconst.


Computer Science 434
Department of Computer Science
Williams College

Node Phrase TypesSyntax Tree OrganizationRepresenting Syntax Tree Nodes