![]() | ![]() | ![]() | Representing Syntax Tree Nodes |
This leads to a syntax tree with five distinct node types. As a
result, to specify the general "type" of a syntax tree node, we
use the C union type node
described below.1
/* Union type that combines the 5 structure types used to describe */
/* tree nodes */
typedef union nodeunion {
struct unknode unk;
struct internalnode internal;
struct identnode ident;
struct constnode constant;
struct displaynode display;
struct refvarnode var;
} node;
Each of the five node types present in the tree include two common
fields: the field specifying the node's phrase type2 ( type
) and the field
specifying the line of the source code on which the first token of the
phrase represented by the node's subtree occurred (line
). The
type
unknode
, whose definition is shown below,
/* The type 'unknode' provides a template that can be used to */
/* access the common components found in all node types when */
/* the actual type of the node is unknown. */
struct unknode {
nodetype type;
int line;
};
allows one to reference these fields in situations
where the actual type of the node is not yet known. For example, if
`root
' is a pointer to a node
of unknown type one can
use the expression:
root->unk.typeto determine its phrase type. One could also use the expression `root->internal.type' or `root->ident.type', but these expressions mis-leadingly suggest that the type of the node is already known. The type
unknode
is provided to support clear coding.
The structure type internalnode
describes the nodes used to
represent almost all of the internal nodes of the tree. In addition to the common
/* Tree nodes of type 'internal' are used for all nodes that */
/* are internal to the tree produced by the parser except */
/* for the roots of variable reference subtrees. */
struct internalnode {
nodetype type;
int line;
union nodeunion *child[MAXKIDS]; /* pointers to the node's sub-trees */
};
type
and line
components found in all nodes, a node
of this type includes a component child
which is an array of
pointers to its children. The number of children of a given node can
be determined from its node type. The syntactic analysis routines I will
provide conserve memory by only allocating space for the child
pointers actually used by a given internal node. Thus, if a node
should only have 2 children, its third child pointer should not be
used for any purpose.
The structure types identnode
and constnode
are used to
represent the leaves of the syntax trees produced by the parser.
Declarations of the structure types are shown below:
/* Nodes of type 'ident' are used for leaf nodes corresponding */
/* to identifiers in the source code. The value in the */
/* 'type' component of such a node will always be 'Nident'. */
struct identnode {
nodetype type;
int line;
identdesc *ident; /* Pointer to associated identifier descriptor */
decldesc *decl; /* Pointer to associated declaration descriptor */
};
/* Nodes of type 'constant' are used for leaf nodes corresponding */
/* to constants in the source code. The value in the 'type' */
/* component of such a node will always be 'Nconst'. */
struct constnode {
nodetype type;
int line;
int value; /* Integer value of the constant */
int ischar; /* True if this was a character constant */
};
Identifiers are represented
by nodes of type identnode
. The type
component of such
nodes will always be Nident
. The ident
and decl
components of an identnode
are pointers to the appropriate
identifier descriptor and declaration descriptor for the identifier
being referenced. The decl
components of identnode
nodes
are set to NULL (the value 0) by the syntactic analyzer. During
semantic analysis, the correct values should be stored in these
fields.
There is one special group of identnode
s produced by the
syntactic analyzer. These are identnode
s for the keyword integer. Technically, integer is a keyword rather than an
identifier in Co. Treating it as an identifier that has been
declared as a type, however, will simplify various parts of the
compiler. Accordingly, occurrences of integer will be
represented by special identnode
s in the syntax tree.
Each constnode
contains two fields beyond the common
type
and line
fields. One is named value
. It holds
the integer value of the constant. The second is a field
named ischar
which is used as a boolean flag indicating whether the
constant found in the source code was a character or an integer. The
type
component of all such nodes will be Nconst
.
There are two additional node types related to subtrees
representing references to variables. Their declarations are shown
in figures * and *.
Tree nodes of type
displaynode
are used to refer to the address of a function's
stack frame.
/* Nodes of type 'display' are used for leaf nodes corresponding */
/* to points where the address of the activation record of a */
/* function is needed. Such nodes are not included in the tree */
/* produced by the parser. They are inserted during semantic */
/* processing. */
struct displaynode {
nodetype type;
int line;
int level; /* The nesting level of the function whose activation record
address should be used. */
};
Other than the standard type
and line
fields, the only
member of a display reference node is a level
field use to store
the nesting level of the function whose frame address is to be used.
Finally, nodes of type refvarnode
are used to designate places where
a value should be loaded from a calculated memory address.
/* Refvar nodes are included by the parser as the roots of all */
/* variable reference subtrees. When created by the parser, the */
/* "baseaddr" field will either point to an Nselect, Nsubs or */
/* Nident node. During semantic analysis the "baseaddr" subtree */
/* will be converted into a subtree describing the calculation of */
/* the memory address for the variable. A "displacement" field is */
/* included to hold a constant offset from the base address to the */
/* variable. Finally, to preserve information about the symbolic */
/* variable being used, the semantic analyzer should set the */
/* "vardesc" field to point to the declaration descriptor of the */
/* variable being used. */
struct refvarnode {
nodetype type;
int line;
union nodeunion
* baseaddr; /* Subtree describing base address calculation */
int displacement; /* Displacement to variable relative to base addr */
decldesc *vardesc; /* Declaration descriptor for referenced variable */
};
The baseaddr
field of a refvarnode
points to a subtree
that describes the computation of the base address. The value
of the displacement
field gives a constant value to be added
to the base address before accessing memory. This field is initialized
to 0 in trees created by the parser. The vardesc
field is
intended to point to a declaration descriptor for the variable
referenced. This field is set to NULL by the parser. As the semantic
processor translates variable reference subtrees into a form that
more explicitly describes addressing arithmetic, it should set
each refvarnode
's vardesc
field to point to the appropriate
declaration descriptor.
![]() | ![]() | ![]() | Representing Syntax Tree Nodes |