Session 17

Types and Type Checking


810:155
Translation of Programming Languages


Recap

We are now considering of semantic analysis. Static analysis of semantic conventions can report errors that are undetectable in earlier stages of the compiler. Much like syntactic analysis, semantic analysis has two goals: to ensure semantic correctness and to prepare the AST for code generation. In lecture, we will focus on two semantic analysis tasks, type checking and building a symbol table. Most of the other static checks performed by a compiler are routine by comparison.

Last time, we considered the notion that every expression has a type associated with it, or at least can in principle. The job of a type checker is to verify that the type of a program component matches what the program expects where the component occurs and, possibly, to coerce an expression to take another (usually, more general) type.

Then we began to consider the variety of type expressions that can be used to describe the type of a program value. Today, we will give a more formal definition of type expressions and then look at how to use type expressions in an algorithm that checks types.



Type Expressions

We learned last time that a type can be basic or constructed, and thus that types can have structure. Because a type can now be more than just a name, we need to think more generally about type expressions. We will associate a type expression with each language construct.

We can specify type expressions more explicitly using this inductive definition:

Finally, when we implement a compiler, we often use a special basic type expression error to indicate mismatches that arise in type checking.

What now?

This definition gives us the tools we need to do type checking. Of course, we can use structural recursion to process types when necessary. As we will see, most complications in type checking result from constructed and named types.

If we think of a type expression as an expression in a little language, we can imagine the "abstract syntax" of these expressions. As we've seen in this course and in Programming Languages, trees can be a useful representation for abstract syntax. Consider these trees to represent some of the type expressions above:

basic, tuple, pointer, function



Quick Exercise

Write type expressions for foo in these expressions:

  1. int foo;

  2. int[] foo = new int[26];

  3. Ball[][] foo = new Ball[10][20];

  4. char* foo;

  5. class Foo { int tag; HashMap table }

  6. int foo(String word, char target) { ... }

  7. (define foo
      (lambda (x y z)
        (* x (+ y z))))
    

As we consider solutions to these exercises, at least a couple of important ideas arise:

Now that I have loosened your mind up a bit with a Scheme example, tell me this: What is the type of the primitive Scheme function map? For example:

    (map square '(1 2 3 4 5)) → (1 4 9 16 25)

Recall this definition of map:

    (define map
      (lambda (f lst)
         (if (null? lst)
             '()
             (cons (f (car lst)) (map f (cdr lst))))))

This, too, requires a type variable, and has a new complication -- one of the arguments is a function! But that's okay. A function type works just like any other when it comes to constructing another type. The key here is in determining the individual components. How is this?

type of a higher-order procedure

Okay, but how about this one:

(define foo
  (lambda (f)
    (lambda (x)
      (f (f x)))))

How much can we infer about this foo?

Because Scheme is dynamically typed, it is difficult for a refactoring tool or any other semantic analyzer to provide as much help on types as semantic tools that work on statically-typed languages such as Ada, Java, or C. For instance, this affects the detail that can be provided in messages on semantic errors. Dr. Scheme addresses this problem in part through its use of language levels for teaching.

However, a semantic analyzer can infer useful type information about a program, in much the same way that we were able to infer type information about the Scheme examples above. Haskell is purely functional programming language with a strong type system that uses type inference to assign types to any expression for which the programmer does not explicitly state a type.

We won't be working with type expressions for higher-order procedures any more this semester, but they do stretch our minds a bit -- and they help us to see how type constructors can build new types out of any other type, even a complex function.



Type Systems

A type checker assigns a type expression to each expression in a program. The set of rules it uses for assigning these types is called a type system.

A compiler or other language-processing tool can implement any type system, even one different from the one specified by the language definition itself. Consider:

Of course, we know that compilers do not have to do type checking. The fact that the programs in a language are not type-checked at compile time does not mean that the language does not have a type system. Any feature that can be verified statically can also be verified dynamically, at run time -- so long as the target code carries with it the information needed to perform the check. So, to do type checking at run-time, each program element must know its type. This is how languages like Scheme and Smalltalk work. They are strongly typed, but dynamically typed.

(Side note: This is one reason that a famous computer scientist once defined a virtual machine as "a tool for giving good error messages". He might well have said the same thing about a compiler!)

The converse is not true. There are some checks that can be done only dynamically in many language. Consider this Java-like example:

    char[] table = new char[256];
    int    i;

    ... later:

    foo( table[i] );

The compiler cannot guarantee that the array access table[i] will succeed at run-time, because it cannot guarantee that i will lie in the range [0..255]. A similar problem arises if we fix the range of i but accept a dynamically-allocated array into table.

In a language such as Ada, we can specify our types much more strictly. When working with objects that wrap containers in a language such as Java, we can also provide greater assurance of run-time success. The compiler may be able to provide some help by doing data-flow analysis -- a form of semantic analysis -- to infer more about the values that a variable might take, but data-flow analysis is not able to check all cases of this sort.



A Look at the acdc Type Checker

Even simple languages benefit from this sort of static analysis. Let's take a look back at acdc's type checker to see how it works and what issues it encounters.

acdc's type checker is given a symbol table that records each user-created name and its type. This is possible in ac because all variables must be declared at the top of the program. A separate semantic analyzer builds the symbol table before the type checker is invoked. This frees the type checker to focus on the statements in the program. Of course, the type checker could have built the symbol table, too.

You will notice that this type checker gives types even to statements. ac doesn't that statements have types, and dc doesn't use them in any way, so this step is optional. As an alternative, the type checker could assign a type of void to every statement. However, it costs almost nothing to store useful type information about the statements. Why almost nothing? The compiler uses a small amount of space to store the type, and a small amount of time to process the statement after processing all of its constituent expressions.) The benefit gained is that we can use this information if we ever seek to optimize the program or to target a machine that uses statement types.

checkExpression is the work horse. It assigns types to..

Compound expressions introduce what little complexity there is in ac's type system, because arithmetic expressions combine two values. If their types are the same, then it is easy: assign the same type to the result. If they aren't, we have to generalize the the types, that is, find a type with which both are compatible. ac has only integers and floats, so generalizing means casting integers to floats -- but not vice versa.

To cast integers up to floats, the type checker wraps them in a new kind of AST node, an IntToFloatConversion expression. IntToFloatConversion is an adapter that converts the wrapped int to a float. This wrapper retains the initial information about the expression from the parser while generalizing its data type for processing downstream.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... March 26, 2009