Session 21

Accessing Non-Local Names in Dynamic Allocation


810:155
Translation of Programming Languages


NOTE: I still owe you the rest of the notes for Session 19. I will "fill in the blank" soon. In the meantime, I give you this gift: a cool Ajax-driven regex→NFA→DFA tool.



Recap: Run-Time Systems

We are now considering the synthesis phase of the compiler, which converts a semantically-valid abstract syntax tree into an equivalent program in the target language. Before looking at code generation, we are first studying the run-time system associated with a compiled program. This system consists of the routines that are loaded along with the generated target code at run-time, which enable the target code to execute in the fashion defined by the language.

In Session 19, we considered the basic issues that underlie a run-time system: the activation of computational processing through procedure calls and the binding of names to storage locations and storage locations to values. Last session, we learned how to organize run-time storage, especially for objects allocated statically and for the control stack. Today, we complete our study of procedure calls by learning about how to implement the code that creates and initializes the new stack frame. Then we learn how to implement references to non-local names.



Accessing Non-Local Names

Your programming experience in different languages has introduced you to different scope rules for determining what object is meant by a particular reference to a name. We can think of scope rules as falling into two broad categories:

Most languages also use a "most closely nested" rule, which says that a reference refers to the object declared in the enclosing context closest to the reference.

We will devote most of our attention today to lexical scoping, which is the more common way to determine name references these days.



Accessing Names Across Blocks

Some languages allow the programmer to create a block that declares its own names that are 'local' to its body, that is, that can only be referred to from that section of code. Consider this C example from the Dragon book:

    main()
    {
        int a = 0;
        int b = 0;
        {
            int b = 1;
            {
                int a = 2;
                printf( "%d %d\n", a, b );
            }
            {
                int b = 3;
                printf( "%d %d\n", a, b );
            }
            printf( "%d %d\n", a, b );
        }
        printf( "%d %d\n", a, b );
    }

Notice how the variables in Blocks 2 and 3 create 'holes' in the scope of the variables declared in Blocks 1 and 0, and how the b declared in Block 1 creates a hole in the variable declared in Block 0. In such cases, we say that the inner declaration shadows the object declared in the containing block.

Note, too, that other languages do not allow this. Consider this similar example in Java:

    public class BlockScopeDemo
    {
        public static void main()
        {
            int i = 0;
            char c = 'a';
            for (int i = 0; i < 5; i++)
            {
                System.out.println( i );
                System.out.println( c );

                {
                    char c = (char) ( (int) 'a' + i );
                    System.out.println( c );
                }
            }

            System.out.println( i );
            System.out.println( c );
        }
    }

    mac os x > javac BlockScopeDemo.java 
    BlockScopeDemo.java:7: i is already defined in main()
            for (int i = 0; i < 5; i++)
                     ^
    BlockScopeDemo.java:13: c is already defined in main()
                    char c = (char) ( (int) 'a' + i );
                         ^
    2 errors

We can implement block-structured name access in our compiler using the same control stack mechanism we use for procedure invocations. In this approach, a block resembles a procedure with no parameters that is called in one place (the spot just before it begins) and returns to one place (the spot just after it ends). With no parameters and transparent control flow, this sort of stack frame is straightforward to create and use.

When we execute blocks and procedure calls, we can see that the symbol table is a dynamic structure, not a static catalog of names and information. Consider the AST for the C program above:

the AST for our C program

At these points in the execution...

... the symbol table is in the following states:

the symbol table grows and shrinks

Louden describes several ways to implement symbol tables and discusses some ideas of how to implement them efficiently. We'll talk more about symbol tables later. But notice how easily the ideas from your Programming Languages course transfer. Scopes can be implemented by pushing declarations (in that course, bindings) onto a stack, which is implemented as a linked list!

An optimizing compiler can also implement nested blocks by allocating all objects, even those in nested blocks, at one time. For the C example program, the compiler could allocate memory for the a's and b's as follows:

allocating dynamic objects optimally

Some variables can share a single physical location because they do not exist at the same time. The maximum storage can be computed via static analysis at compile time, though the run-time system may need to be conservative with respect to potential control flow (for example, assume that both the then block and the else block of a selection statement will execute, and assume that a while statement will execute at least once).



Accessing Non-Local Names without Nested Procedures

Consider this example from C, which does not allow the programmer to define one procedure inside another:

    int a[11];
    readarray() { ... a ... }
    int partition(int y, int z) { ... a ... }
    quicksort(int m, int n) { ... }
    main() { ... a ... }

In such a language, any non-local reference is to a name declared outside of any procedure! This means that our standard stack allocation mechanism works as-is. All objects that are declared outside the scope of any procedure can be allocated statically and thus referenced using a fixed address (by offset, of course).

One side effect of this language feature is that passing procedures into and out of procedures becomes straightforward. Any non-local reference is non-local to all procedures, so the same static address for the reference works everywhere. This is why a language like C -- which is low level in so many other ways -- accommodates the abstraction of passing of function pointers so easily.

Consider this example from Oberon, a descendant of Pascal, which has a somewhat more readable syntax for passing procedures:

    MODULE PassProcedures;

    VAR
       m : INTEGER;

    PROCEDURE f(VAR n : INTEGER);
       BEGIN
          n := m + n
       END f;

    PROCEDURE g(VAR n : INTEGER);
       BEGIN
          n := m * n
       END g;

    PROCEDURE b(PROCEDURE h(n : INTEGER));
       VAR
          n : INTEGER;
       BEGIN
          n := 2;
          h(n);
          Write(n)
       END b;

    BEGIN
       m := 0;
       b(f);
       b(g)
    END PassProcedures.

All references to m are non-local, whether in the main block or in a user-defined procedure. So the compiler can alocate storage for it statically, and executions of f and g can use the same static address used by the main program.



Accessing Non-Local Names in the Presence of Nested Procedures

Many languages, including those in the Pascal family, allow the programmer to define one procedure inside another, a so-called nested procedure. This feature violates the assumption we made above that any non-local reference must be to a statically-allocated object, because now a reference could be to an object declared in the containing procedure. Consider this Oberon-like example:

    MODULE Sort;

    VAR
       a : ARRAY 10 OF INTEGER;

    PROCEDURE ReadArray();
       BEGIN
          ...  a  ...
       END ReadArray;

    PROCEDURE Exchange(i, j : INTEGER);
       VAR
          x : INTEGER;                    (* Should we make x global? *)
       BEGIN
          x := a[i]; a[i] := a[j]; a[j] := x
       END Exchange;

    PROCEDURE Quicksort(m, n : INTEGER);
       VAR
          k, v : INTEGER;

       PROCEDURE Partition(y, z : INTEGER; VAR result : INTEGER);
          VAR
             i, j : INTEGER;
          BEGIN
             ...  a  ...                  (* reference to "global"  a *)
             ...  v  ...                  (* reference to non-local v *)
             ...  Exchange(i, j);  ...    (* reference to non-local proc *)
          END Partition;

       BEGIN
          ...  a  ...
          ...  k  ...
          ...  v  ...
       END Quicksort;

    BEGIN
       ...  a  ...
    END Sort.

In order to reason about scope in such a nested structure, we can use the idea of nesting depth. The depth of a program or module, which we consider the outermost procedure, is 1. For every level of nesting, we add 1 to a procedure's depth. So, in this example, the depth of Sort is 1, the depth of Quicksort is 2, and the depth of Partition is 3.

We then associate with each reference to a name two pieces of information:

The compiler knows both of these pieces of information at compile time.

If this all sounds familar to some of you, it should. This is the concept of a lexical address that you learned about in your Programming Languages, but here the depths are counted outside-in.

In order for an activation of a nested procedure to be able to access an activation of its nesting procedure, the compiled program must use an access link in its stack frames. For procedures that are not nested, the access links will all point to the same place: to the local data region in the activation record for the program itself. This provides access to global variables. The access link in the activation record for the program can be left blank. It will never be used anyway, because the outermost procedure has no non-local references that are not already statically allocated!

Thus, as procedures are called, the access link in the activation record of the calling procedure can be stored in the activation record of the called procedure. But for nested procedures, things must work differently.

Suppose that procedure p is nested inside procedure q. An activation of p needs to be able to access variables defined in the calling activation of q. When the compiler creates an activation record for p, it must set p's access link to point to the local data region in the most recent activation record for q. This is the address immediately following the access link in the stack frame.

Consider an execution of our Sort program. Suppose that the main module calls Quicksort(0,7), which calls Quicksort(0,3). In quicksorting the subrange 0..3, Quicksort(0,3) calls Partition(0,3), which calls Exchange(0,2). This is how the run-time would grow in the course of these procedure calls, with the access links set:

use of access links in quicksort example

Notice that the access link for both activations of Quicksort point to the access link for Sort, which is the immediately containing procedure. The access link for Partition points to the most recent record for Quicksort, which is its containing procedure. Finally, the access link for Exchange point to the access link for Sort, not Partition nor Quicksort, because Exchange is nested inside of the main module. In this way, the activation of Exchange(0,2) has access to the global array a, even if Partition or Quicksort declares a local identifier a.

How can the run-time system use access links to look up the targets of non-local references? Consider a non-local reference x (nesting level nx) in procedure p (nesting level np). Because the reference is non-local, we know that nx < np. The compiled program can find the storage for x in two steps:

  1. Follow np-nx links back from the activation record for p to the activation record of the procedure that defines x. The compiler can compute this value at compile time.

  2. Within the activation record of the procedure that defines x, the location of x is at some fixed value in the procedure's temporary data object section. The simplest thing to do is to make this value an offset from the activation link. Again, this offset is known at compile time.

Store these two numbers, np-nx and the offset within the activation record that contains x, in the symbol table with the non-local reference in the called procedure.

Again, consider our Sort example. The Partition procedure refers to a and v, both of which are non-local. a and v have nesting depths of 1 and 2, repectively, and Partition has a nesting depth of 3. So, a can be found 3-1=2 hops away, and v can be found in 3-2=1 hops. The offset for a in its record is 0, while the offset for v in its record is size(INTEGER).

The code to set up access links can be placed in the calling sequence. Suppose that procedure p (at nesting level np) calls procedure q (at nesting level nq). If nq > np, then q is nested within the p. Otherwise, the rules of block structure would make q inaccessible to p. So, set the called procedure's access link to point to the caller's access link. We see two examples of this in our Sort diagram above, first when Sort calls Quicksort (column 1) and then again when Quicksort calls Partition (column 3).

If instead nqnp, then the containing procedures at depths 1 through nq-1 must be the same. Either the two procedures are defined at the same depth, or p is nested inside a procedure defined at the same depth as q. We see examples of both of these sub-cases in our Sort program, when Quicksort calls itself (column 2) and when Partition calls Exchange (column 4), respectively. When nqnp, the called procedure needs access to the most recent activation record of the procedure that encloses both p and q, which can be found np-nq+1 hops away.

A procedure that is passed as an argument must bring its access link with it! Consider this Pascal example:

    PROGRAM PassAccessLink;

       PROCEDURE WriteValueOf( FUNCTION f(n : INTEGER): INTEGER );
          BEGIN
             writeln( f(2000) )
          END;

       PROCEDURE ForwardingProcedure;
          VAR
             x : INTEGER;

          FUNCTION LocalFunction(n: INTEGER) : INTEGER;
             BEGIN
                LocalFunction := x + n
             END;

          BEGIN
             x := 9;
             WriteValueOf( LocalFunction )
          END;

       BEGIN
          ForwardingProcedure
       END.

The passing procedure, ForwardingProcedure, computes the necessary access link and sends it along with the procedure LocalFunction. Then, when the argument procedure f is invoked, the caller WriteValueOf sets f's access link to the value that was passed with LocalFunction.

The process used by the compiler to compute and set access links may seem complicated at first read, but it follows directly from what you know about nested procedures. This description may help you understand just how a language such as Scheme, with block structure and higher-order procedures, is able to work.



Accessing Non-Local Names Under Dynamic Scope

We don't have time to consider all possibilities this semester. Dynamic scope is rare enough these days that we can safely skip over it. But you can begin to imagine how the compiler might implement it... Instead of using the address of the defining context as the access link, the compiler can use the address of the calling context -- which means that the access link can be the same as the control link! So dynamic scope is actually easier for the compiler writer to implement, but it is usually considered harder for most programmers to use effectively and safely. This is an example where by advancing the state of the art of compilers computer scientists were able to design and implement better languages for programmers.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... April 14, 2009