JACCL Compiler Implementation Project
Phases 2.1 & 2.2: Code Generation for Expressions and Statements
Code for Arithmetic Expressions & Assignments due: March 12, 1999
Code for Control Structures due: March 19, 1999
Now it is time to actually start generating code for JACCL programs. For this phase, you should complete the generation of code for expressions, assignment statements and control structures (i.e. if and while statements).
I have broken this assignment into two sub-phases. For the first,
I want you to concentrate on generating code for assignment
statements and simple arithmetic expressions. By the phrase
"simple arithmetic expression" I mean those expressions
involving only the arithmetic operators +, -, *, and
%. In particular, during this first sub-phase you should not
attempt to generate code for relational operators or for the logical
operators.
In the second sub-phase, you should extend your routines so that they also generate code for relational operators, logical operators, if statements and while loops.
While working on both parts of this code-generation phase, you should try to structure your code in such a way that you maintain a separation between "high-level" and "low-level" code generation. The high-level routines are the ones that traverse the syntax tree and handle the code-generation issues that are directly related to handling JACCL programs. The low-level routines should perform functions like actually outputting instructions, allocating registers to hold temporaries and selecting the appropriate effective address to generate to reference a particular operand.
The instructions output by your low level code generator should be in the format accepted by the 34000 assembler. That is, you should be generating program text rather than a binary file. The instruction set of the MC34000 and its assembly language are described in the handouts "The MC34000 Computer" and "The MC34000 Assembler".
Even though your output will be text, it will not be very readable. Variables will be referenced using numerical displacements rather than symbolic names. One (required) way to make things easier to follow is to include lines in your code that indicate which lines of output correspond to which lines in the original input program. The best way to do this is to output assembler SOURCE directives. This will not only aid you in reading your code in these phases. It will allow you to use the 34000 debugger when you actually generate complete programs in phase 2.3.
While you will not be optimizing your code in any sense, you should generate good naive code. In particular, you should avoid early loads of display pointers and perform arithmetic operations on variables in memory rather than first loading them into registers when possible. Also, when handling control structures you should use the techniques discussed in class to avoid generating branches to unconditional branches.
The code you generate for variable references will need to know how to access the "display" of pointers to activation records. You will not need to know how to maintain the correct values in the "display" array at this point. It will not be until phase 2.3 that you actually produce the code that puts the correct values in the display. The only thing you need to know about the display at this point is where it will be. The low-level code generator will need to know this in order to generate instructions to load display pointers into address registers when needed.
The simplest answer to this question is that the display will be an array placed in memory with the program's global variables. This really just puts off the question, because you don't know exactly where the globals will be at this point. In generating code, however, you can simply assume that the base address for all global variables will be loaded into address register 5. So, all you need to know is the displacement to the display relative to address register 5. If you followed my suggestion and have assigned displacements to global variables starting at -1 and using increasingly negative displacements (just as you have to do for variables local to procedures), there is a very natural place to put the display. Assign the display to displacement 0 relative to A5. This will mean that the display is the last thing in the global variable area (and that the last global variable processed will be the first thing).
I do request that you use my type definitions
for the operand descriptors your code for this phase manipuates.
The definitions for this type can be found in the file opdesc.h
in the phase2 subdirectory of my CS 434 pub directory. If you
update the PHASE variable in your Makefile to "2" you will
be able to simply "#include opdesc.h".
To help you get a good start, I will give you some additional
C code which you are free
to use as is, modify or ignore as you see fit. This code can be found
in the two files codegen.h and stmtgen.c in the directory
~tom/pub/434/phase2. If you want to use either of these files, you
should copy them to your own directories.
The codegen.h file contains descriptions
of some of the main data structures I used in my own solution to this exercise.
In particular, it includes definitions for the
tables used to keep track of data temporaries
(registers and memory) and address registers. The file itself contains
comments that (completely?) describe these data structures.
codegen.h also contains headers for the most important routines
provided by my low-level code generator. Note: I am not giving you
the code for these routines. You have to write your own low-level
code generator. In particular, don't be fooled by comments that say a
particular variable or array will contain some value. Such comments
will only be true if you write code that makes them true! Again, these
files are offered to help you get a good start
on your own design.
The other file I am providing is stmtgen.c. For the first part
of this assignment, you do not need to generate code for statements,
but you will still need routines that traverse the statement portions
of the syntax tree to find all the expressions contained in statements
so that you can generate code for them. You would throw out much of this
code once you were working on the second part of the assignment.
So, what I have given you is my version of the code to traverse the syntax
tree calling "genExpr" whenever it finds an
expression. As I indicated above, you are free to use, modify
or ignore them depending on how much the "hints" the file
contains influences your own final design.