For the last few weeks, we have been talking about the synthesis phase of a compiler, which converts a semantically-valid abstract syntax tree into an equivalent program in the target language. This involves building a run-time system to support the execution of a program in target code, perhaps translating the AST into an intermediate representation such as three-address code along the way, and generating code in the target language. Most recently, we have been looking at how to generate target code for basic code generation, including register selection, and procedures.
In our last class session, let's consider briefly some important ideas in the area of code optimization, in particular the notion of tail call optimization. You may be surprised at how easily we could optimize our Klein-to-TM code generator using these ideas. Even if you do not have time to implement them in your compiler, you can appreciate the process.
As a reading bonus, I found some old lecture notes on how Louden's TINY compiler generates code. I doubt they are of much use to you now, but I've included them below in case.
Ideally, we would like for our compilers to generate code that is as good as or better than the code that can be written by hand. In many situations, we can achieve this only in a limited number of situations and after much difficulty.
A good optimization stage can help a compiler do as well as it can. Actually, the term optimization is something of a misnomer, as showing that a piece of code is optimal is undecidable. More accurately, an optimizer simply makes improvements to the code.
Optimizations can be be either machine-independent or machine-dependent. Machine-independent transformations can be done on the intermediate representation, prior to code generation. Machine-dependent transformations are best accomplished on target code.
An optimization...
A common approach to designing an optimizer is to find the most frequently executed code in typical target programs and focus efforts to improve the program there. In common usage, the "80:20 Rule" says that 80% of some cost follows from 20% of the resources. In a regular program, we would use a profiler to execute a program on an accepted suite of data to identify the hot spots in the program. Finding or creating a widely accepted suite of input programs for a compiler can be difficult, though it has been done for many languages, including C, Tex, and Scheme.
Another common approach to implementing an optimizer is to:
Common program transformations that improve the resulting target code include elimination of subexpressions, copy propagation, elimination of dead code, and the folding of constants into the code.
Common code elements to be optimized are inner loops, procedure calls, and address calculations.
As an example, let's take a look at a way to improve the efficiency of target code for procedure calls. This is especially germane for a language such as BigRig, in which function calls play such an important role. There is a long of history of compiler writers optimizing function calls of various sorts in languages where they are central. Consider the idea of inlining procedure calls in languages such as C++. This technique is especially useful for object-oriented accessors, which we usually write in order to make our code easier to maintain (a programming-time concern), not more efficient (a run-time concern).
Another example of how one can optimize function calls is to improve the behavior of the calls on the run-time stack. This is a timely issue these days, given the recent discussion of why Python does not optimize for recursion and why it might be worth the effort. We'll consider the issue in teh context of BigRig.
In our study of how programs typically work at run-time, a call to a procedure creates a new activation record for the procedure on the call stack. This record, also called a 'stack frame', contains the values of the arguments to the function, the state of the machine's registers at the time of the call, and any local variables. These local objects are the bindings created while executing the function, including any temporary objects not explicitly mentioned in the program. When the function returns, its return value is copied to the correct location in the caller, the state of of the machine's registers is restored, and its activation record is popped from the stack.
The existence of a stack frame makes it possible to implement recursive function calls, which conceptually requires a copy of its local variables, including arguments, for each activation. So each invocation of this factorial(x) function:
function factorial :=
function with number x and returns number
number result := 1.
if x >= 2 then,
assign result := x * call factorial( x-1 ).
!
return result.
!.
... will have its own stack frame, with its own copy of the variable x.
But some functions have a particular shape. Consider this BigRig function, from our Euclid's algorithm program:
function remainder :=
function with number a, number b and returns number:
number result := 0.
if a < b then,
assign result := a.
else
assign result := call remainder(a-b, b).
!
returns result.
!.
Unlike factorial, which stores the result of the recursive call in a temporary object for use in computing its answer, remainder simply returns the result of the recursive call as its own answer -- no need for temporary storage, and further computation to do. Think about what happens at run-time, when a call to remainder(10001,2) reaches its base case. 1 will be returned as the answer from a call to remainder(1,2) to an invocation of remainder(3,2), which will immediately return 1 as the answer to an invocation of remainder(5,2), which will immediately return 1 as the answer to an invocation of remainder(7,2), ... and so on, for 5000 stack frames -- with no new computation done, and no value added.
This is called tail recursion. With a tail-recursive function, the extra stack frames add no value. This is the insight: We can reuse the original stack frame for all of the calls! Instead of creating a new activation record, the new arguments can be written into the argument slots of the existing record, and the program can jump back to the top of the procedure. Conceptually, we can think of remainder as doing this:
function remainder :=
function with number a, number b and returns number:
begin:
if a >= b then,
assign a := a - b.
assign b := b.
goto begin
!
returns a.
!.
This optimization is called tail recursion elimination. It can be applied in cases where the recursive call is in the "tail position" of the function, where the value of the recursive call is the value of the calling function. A code generator, or a separate semantic analyzer running before code generation, can recognize tail-recursive calls without much effort and generate much more space-efficient code.
But it gets much better than this. Consider a typical BigRig function that is not tail-recursive:
function with number current, number max and returns boolean:
boolean result := true.
if max < current then,
assign result := true.
else
assign result := call doSieveAt(current, max).
!
returns result.
!
sieveAt is not recursive, but its tail position consists of a function call, the result of which will be returned as the value of the call to sieveAt. In such a case, the called function's activation record "adds value" to the computation, but... The calling function's activation record is no longer needed -- it's just a placeholder that will be used to pass through the result of the called function to the caller's caller.
In the above case, it may be possible to pop sieveAt's stack frame before the invocation of doSieveAt, rather than after sieveAt returns. This idea generalizes from the idea of eliminating tail recursion and is the essence of tail call optimization. sieveAt can:
When doSieveAt returns, control will pass directly to sieveAt's caller.
If a function invokes itself in a tail position, the tail-call optimization becomes a tail-recursion optimization.
This technique is relatively straightforward to implement, both the detection of tail calls (*) and the implementation of the improved code. It is also remarkably valuable for optimizing programs written in a functional language, or a functional style.
(*) In BigRig, the compiler would need to make another code transformation first: eliminating the novice-friendly but execution-expensive coding pattern of creating a result variable, assigning it a value in both branches of the choice, and then returning it. This transformation is also remarkably simple to make!
Even better, some functions that don't look like tail calls are in fact. Consider that:
function horner :=
function with number x, number n, number value and returns integer:
number result := 0.
if 0 <= n then,
assign result := call horner( x, n - 1,
x * value + call coefficient(n) ).
else
assign result := value.
!
return result.
!
is equivalent to:
function horner :=
function with number x, number n, number value and returns integer:
if n < 0 then, // use clause as default value
return value. // at 3AC or assembly level
!
return result := call horner( x, n - 1,
x * value + call coefficient(n) ).
!
Very nice.
But there are issues that complicate the use of tail-call optimization. Some of these make it difficult or impossible to implement the technique in languages such as Java and C++. These include the order in which arguments are evaluated (what happens if overwriting one argument might change the computation of another?) and the invocation of constructors and destructors of objects in dynamic memory. You can read more about these issues, and see one of my sources of examples for this session, at Lambda the Ultimate.
We learned about certain kinds of meaning-preserving transformations when we studied syntactic abstractions back in Programming Languages and Paradigms. For this reason, code optimization should remind you of refactoring, which a restructuring of a program done by a program to improve its design. Refactorings must also preserve the functional behavior of its input program. Code optimization differs from refactoring in at least one respect: we optimize generated code primarily to improve its speed or space usage, whereas programmers refactor their code for a broader variety of reasons, with an emphasis on human-centered features such as extensibility or flexibility.
Still, the techniques used to refactor code are similar or identical to the techniques used to optimize code, and as a result we are able to build tools to help programmers refactor safely.
... if time, demo IntelliJ IDEA, a commercial IDE with refactoring support, or Eclipse, a popular open-source IDE.
... is our final exam period. As the syllabus tells us, we won't have a final exam, but we will meet. We will discuss the structure of your systems, watch them run on some programs, and consider some of their performance characteristics -- especially the size and speed of the compiler and the size and speed of the Java executable your compiler generates. We will also debrief the experiences of building a compiler and working on a team.
The code module implements some details of Louden's run-time system and functions for emitting code. code.h defines a number of registers for aliases, including a gp, a pointer to the bottom of the data memory (where variables are allocated), and mp, a pointer to the top of the data memory (where temporaries are allocated).
This works because Tiny does not have procedures and so has no need for a stack of activation records. For your Klein compiler, you may want to allocate all variables at the bottom of the memory and let your stack grow down from the top:
Louden uses two of TM's registers (5 and 6) to hold these memory pointers. He then uses two more registers (0 and 1) as the working space for all of his code generation -- and never allocates registers 2, 3, and 4!
code.c defines methods for generating particular kinds of code, including RO instructions, RM instructions, and TM comments. These are called from the code generator itself, which now doesn't have to worry about the format of instructions, instruction memory, or issues related to debug code.
The public interface to the code generator is defined in cgen.h -- the procedure codeGen. This procedure is a wrapper: it produces standard code at the head and tail of the target file, wrapped around a call to the procedure that does the real work, cGen. That procedure itself is pretty simple, as it simply examines its tree node argument and calls the appropriate helper method.
... the two code examples from Section 8.8.2:
cGen(p1); /* code for left arg */
emitRM("ST",ac,tmpOffset--,mp,"op: push left"); /* push left operand */
cGen(p2); /* code for right arg */
emitRM("LD",ac1,++tmpOffset,mp,"op: load left"); /* reload left operand */
emitRO("SUB",ac,ac1,ac,"op <") ;
emitRM("JLT",ac,2,pc,"branch if true") ;
emitRM("LDC",ac,0,ac,"false case") ;
emitRM("LDA",pc,1,pc,"unconditional jmp") ;
emitRM("LDC",ac,1,ac,"true case") ;
You'll want to create code templates for all of the statement and expression types in Klein. Louden's templates can often serve as examples from which to work.