JACCL Compiler Implementation Project
Phases 2.1 & 2.2: Code Generation for Expressions and Statements
Code for Arithmetic Expressions & Assignments due: March 12, 1999
Code for Control Structures due: March 19, 1999

Now it is time to actually start generating code for JACCL programs. For this phase, you should complete the generation of code for expressions, assignment statements and control structures (i.e. if and while statements).

I have broken this assignment into two sub-phases. For the first, I want you to concentrate on generating code for assignment statements and simple arithmetic expressions. By the phrase "simple arithmetic expression" I mean those expressions involving only the arithmetic operators +, -, *, and %. In particular, during this first sub-phase you should not attempt to generate code for relational operators or for the logical operators.

In the second sub-phase, you should extend your routines so that they also generate code for relational operators, logical operators, if statements and while loops.

While working on both parts of this code-generation phase, you should try to structure your code in such a way that you maintain a separation between "high-level" and "low-level" code generation. The high-level routines are the ones that traverse the syntax tree and handle the code-generation issues that are directly related to handling JACCL programs. The low-level routines should perform functions like actually outputting instructions, allocating registers to hold temporaries and selecting the appropriate effective address to generate to reference a particular operand.

The instructions output by your low level code generator should be in the format accepted by the 34000 assembler. That is, you should be generating program text rather than a binary file. The instruction set of the MC34000 and its assembly language are described in the handouts "The MC34000 Computer" and "The MC34000 Assembler".

Even though your output will be text, it will not be very readable. Variables will be referenced using numerical displacements rather than symbolic names. One (required) way to make things easier to follow is to include lines in your code that indicate which lines of output correspond to which lines in the original input program. The best way to do this is to output assembler SOURCE directives. This will not only aid you in reading your code in these phases. It will allow you to use the 34000 debugger when you actually generate complete programs in phase 2.3.

While you will not be optimizing your code in any sense, you should generate good naive code. In particular, you should avoid early loads of display pointers and perform arithmetic operations on variables in memory rather than first loading them into registers when possible. Also, when handling control structures you should use the techniques discussed in class to avoid generating branches to unconditional branches.

The code you generate for variable references will need to know how to access the "display" of pointers to activation records. You will not need to know how to maintain the correct values in the "display" array at this point. It will not be until phase 2.3 that you actually produce the code that puts the correct values in the display. The only thing you need to know about the display at this point is where it will be. The low-level code generator will need to know this in order to generate instructions to load display pointers into address registers when needed.

The simplest answer to this question is that the display will be an array placed in memory with the program's global variables. This really just puts off the question, because you don't know exactly where the globals will be at this point. In generating code, however, you can simply assume that the base address for all global variables will be loaded into address register 5. So, all you need to know is the displacement to the display relative to address register 5. If you followed my suggestion and have assigned displacements to global variables starting at -1 and using increasingly negative displacements (just as you have to do for variables local to procedures), there is a very natural place to put the display. Assign the display to displacement 0 relative to A5. This will mean that the display is the last thing in the global variable area (and that the last global variable processed will be the first thing).

I do request that you use my type definitions for the operand descriptors your code for this phase manipuates. The definitions for this type can be found in the file opdesc.h in the phase2 subdirectory of my CS 434 pub directory. If you update the PHASE variable in your Makefile to "2" you will be able to simply "#include opdesc.h".

To help you get a good start, I will give you some additional C code which you are free to use as is, modify or ignore as you see fit. This code can be found in the two files codegen.h and stmtgen.c in the directory ~tom/pub/434/phase2. If you want to use either of these files, you should copy them to your own directories.

The codegen.h file contains descriptions of some of the main data structures I used in my own solution to this exercise. In particular, it includes definitions for the tables used to keep track of data temporaries (registers and memory) and address registers. The file itself contains comments that (completely?) describe these data structures.

codegen.h also contains headers for the most important routines provided by my low-level code generator. Note: I am not giving you the code for these routines. You have to write your own low-level code generator. In particular, don't be fooled by comments that say a particular variable or array will contain some value. Such comments will only be true if you write code that makes them true! Again, these files are offered to help you get a good start on your own design.

The other file I am providing is stmtgen.c. For the first part of this assignment, you do not need to generate code for statements, but you will still need routines that traverse the statement portions of the syntax tree to find all the expressions contained in statements so that you can generate code for them. You would throw out much of this code once you were working on the second part of the assignment. So, what I have given you is my version of the code to traverse the syntax tree calling "genExpr" whenever it finds an expression. As I indicated above, you are free to use, modify or ignore them depending on how much the "hints" the file contains influences your own final design.


Computer Science 434
Department of Computer Science
Williams College