Session 22

Parameters Passing and Dynamic Allocation


810:155
Translation of Programming Languages


Opening Exercise

Write a BigRig program to write to standard output the greatest common divisor of two integers read from standard input. Use Euclid's Algorithm.

Here is my solution, modified with your suggestions in class that clarified the grammar and semantics of BigRig. (Be sure that this information makes it into your grammar and explanatory notes on the language!!).

I learned a lot by doing this exercise. I hope you did. Do it again among yourselves with other simple programs. You will learn a lot, and end up with several good test programs to boot!



Recap: Run-Time Systems

We are now considering the synthesis phase of the compiler. Our first topic is the run-time system, which consists of the routines that are loaded along with the generated target code at run-time, which enable the target code to execute in the fashion defined by the language. In Session 19, we considered the basic issues that underlie a run-time system: the activation of computational processing through procedure calls and the binding of names to storage locations and storage locations to values. In Session 20, we learned how to organize run-time storage, especially for objects allocated statically and for the control stack. We also looked at the code that creates and initializes the new stack frame. Last time, we looked at how the run-time system can be set up to support access to non-local objects.

Today we will wrap up our discussion of run-time systems with a quick look at how to pass parameters, how to implement a symbol table, and how to allocate memory for dynamic objects.



Passing Parameters

Consider the Exchange procedure in our Oberon-like sort program: from last time:

    PROCEDURE Exchange(i, j : INTEGER);
       VAR
          x : INTEGER;
       BEGIN
          x := a[i]; a[i] := a[j]; a[j] := x
       END Exchange;

Exchange affects the outside world by changing a non-local object, not by changing the value of one of its parameters. Its arguments are passed using a protocol named call-by-value. This means that i and j are assigned the values of the arguments passed to Exchange. i and j are objects accessible only within the block created by the procedure.

When arguments are passed by value, the formal parameters are like any other local object: their storage is located in the called procedure's activation record on the stack. The caller computes the value of its arguments and stores them in the appropriate location, usually the first slots in the activation record, which lies just on top of the caller's record. (It is, of course, possible that these objects will lie in the program's static objects region, if the language allows this and the compiler does so.)

Suppose that we wanted to swap the values of i and j themselves:

    MODULE PassByReference;                  (* ... not yet ... *)

    VAR
       x, y : INTEGER;

    PROCEDURE Swap(i, j : INTEGER);
       VAR
          x : INTEGER;
       BEGIN
          x := i; i := j; j := x
       END Exchange;

    BEGIN
       x := 1; y := 2;
       Swap(x, y);
       Write(x);
       Write(y)
    END Reference.

The call-by-value protocol means that changes to i and j will be visible only within Swap; the values of x and y remain unchanged.

Is BigRig pass-by-value? What about list arguments?

A call-by-reference mechanism allows a procedure to be called in such a way that operations on the arguments within the procedure are actually applied to the locations of the formal parameters. In Pascal and its descendants, we indicate that an argument is to be passed by reference using a modifier. In Pascal and Oberon, the modifier is VAR:

    PROCEDURE Swap(VAR i, j : INTEGER);

Note that VAR distributes over the whole declaration and so applies to both i and j.

When an argument is passed by reference, the caller intends to pass not the value of the argument but a pointer to its location in storage. Any reference to the formal parameter in the body of the called procedure is an indirect reference through the pointer to the object in the calling procedure.

A compiler must generate target code that implements this. If the target machine provides an indirect addressing mode, then the indirect reference of a call-by-reference parameter can be implemented directly using this mode. Otherwise, the target code must contain an extra instruction or more to implement the indirect reference.



Implementing Symbol Tables

The symbol table is an important part of the run-time system, as it tells the compiler what object a particular name refers to. As we've seen, this affects when and how new objects are allocated. Though we can generate a symbol table as early as the lexical analysis phase, the run-time system places particular demands on the symbol table. One example we saw in Session 21 was how the definition of a stack frame depends on knowing the nesting depth of a reference. Such changes mean that the symbol table must either be updated during semantic analysis or built at a later stage.

Because the symbol table may contain many objects and may be referred to by the compiler many, many times, a symbol table implementation must offer efficient means for storing and retrieving items. Not surprisingly, much work on efficient data structures and algorithms has been motivated by compiler construction, including:

This topic is too big for us to consider it any deeper this semester. For your compiler, choose a suitably efficient data structure and implementation in your source language.


  • Allocating Dynamic Objects

    The techniques we considered in Session 20 tell us how to allocate space for static objects, those whose addresses can be determined in at compile time. But what about arrays, variable-length records, or pointers to objects created at run-time?

    When a program is done using an object that was allocated dynamically, the space allocated to the object can be returned to the run-time system to be used for another object. We will consider deallocation and some of the problems it introduces in the next section.

    Because the compiler doesn't know the number or size of dynamically-allocated objects at compile-time, it must create a mechanism for allocating memory for the objects at run-time. We can use one of two basic techniques:

    Choosing among competing alternatives is often driven by the needs of the programmers that use the language, or the implementation in question.



    The Problems Associated with Deallocating Dynamic Objects

    Allocating dynamic objects introduces problems associated with deallocating objects. Consider this Java code:

        int[] grades;
        grades = new int[10];
        ...
        grades = new int[5];
        ...
        grades = new int[20];
    

    The variable grades first points to an array of ten ints allocated dynamically. Unless another variable has been set to refer to this array, the second assignment to grades results in the first array never again being used by the program. The memory associated with this array is now garbage. A program can create garbage in other ways, too, such as by an object going out of scope.

        {
            int[] grades = new int[10];
            ...
        }
        // no more grades!
    

    Garbage exists between the time of the last reference to the object and the time when deallocation occurs.

    The flip side of garbage is the dangling reference. Consider this simple little C program:

        main()
        {
            int *p;
            p = dangle();
        }
    
        int *dangle()
        {
            int i = 23;
            return &i;
        }
    

    The procedure dangle declares a temporary variable named i. When an activation of dangle terminates, the object bound to i disappears as soon as the corresponding activation record is popped from the control stack. Yet dangle has returned its address to main, which assigned the address to the variable p. Thus, p now points to a chunk of memory no longer allocated to the program!

    A program can create a dangling reference in other ways, too, such as when the user explicitly releases the memory associated with an object before all references to the object are gone. For example, in C++ a programmer can call dispose() to release the memory associated with an object at any time. (Java does not have a corresponding operator.)

    A dangling reference exists between the time when deallocation occurs and the time of the last reference to the object.

    Garbage and dangling references arise depending on how the run-time system handles the deallocation of memory. A compiler can use several different approaches to deallocation:

    In order for garbage collection to work, a program must cooperate with the run-time system, by letting it know the size of the object and letting it know when an object is out of use.

    Communicating size is straightforward. When surrendering an object, a program can store the size of the block at the top of the block. Communicating that a block is out of use is much more difficult. Two common approaches are:

    With the popularity of Java and now scripting languages over the last decade, garbage collection has become an even more active area of programming languages research. Up to a quarter of the technical papers presented at some recent OOPSLAs have been on the topic of garbage collection!



    Eugene Wallingford ..... wallingf@cs.uni.edu ..... April 16, 2009