Session 14

Adding Semantic Actions to the ac Parser


810:155
Translation of Programming Languages


Recap: Building ASTs with Semantic Actions

We are discussing the syntax analysis phase of a compiler. In particular, we are learning how to implement our first intermediate representation of a source program, the abstract syntax tree (AST) that serves as a primary input to all later phases in the compiler. Session 12 taught us, among other things, that the later phases of a compiler must read and modify the abstract syntax of a program in several different ways, some perhaps not anticipated by the compiler writer. The way we represent an AST must support downstream processing and future behavior. Then, we began to learn how to augment our table-driven parsing technique to create ASTs as a side effect, using the idea of a semantic action. Finally, in Session 13, we applied those ideas to our small expression grammar and parser. We saw that the process works -- like magic, almost, if we didn't understand the process.

... We were left with one little puzzle from our last session: how would our parser associate operators at the same level of precedence, say, for x * y * z ? What did you find?

Alas, the choice we made for adding the make-plus-exp and make-times-exp semantic actions to the grammar -- at the end of the production rule, after all of its non-terminals have been recognized -- results in right-to-left associativity. That's probably not what we expect or want, given the way most programming languages work.

How can we achieve left-to-right associativity? Make another choice! Tell the parser to build the compound expression before matching and building the rest of the expression:

    expression  := term expression'
    expression' := + term make-plus-exp expression'
                 | Ø
    term        := factor term'
    term'       := * factor make-times-exp term'
                 | Ø
    factor      := ( expression )
                 | identifier make-identifier

You may begin to notice some patterns in how we add semantic actions to a grammar, depending on our goals.


Setting Up Our Work Today

Back in Session 2, I showed you an acdc compiler written in an ad hoc way. Then, in Session 7, we converted our scanner to use regular expressions and finite automata that model the ac language. Now it is time to take the next step: convert our parser to a table-driven implementation.

We will proceed in two big steps. First, we will create a table-driven parser that recognizes legal programs, then we will augment it to generate the abstract syntax tree required by the later phases of the compiler. We will break each of these big steps into the smaller steps we have learned along the way.

This series of exercises gives you some practice building a table-driven top-down parser and exposes you to some of the pragmatics of writing a parsing program.


Task 1: Build the FIRST and FOLLOW Sets

Build the FIRST and FOLLOW sets for the ac grammar:

    program         → declarations statements
    declarations    → declaration declarations
                    |  Ø
    declaration     → floatdecl identifier
                    |  intdecl identifier
    statements      → statement statements
                    |  Ø
    statement       → identifier assign value expressionTail
                    |  print identifier
    expressionTail  → plus value expressionTail
                    |  minus value expressionTail
                    |  Ø
    value           → identifier
                    |  number

    floatdecl       → f
    intdecl         → i
    assign          → =
    print           → p
    plus            → +
    minus           → -
    identifier      → all lowercase letters except {f,i,p}
    number          → integers, or reals with both whole and fractional parts

Here are the algorithms for generating FIRST and FOLLOW sets.

Solution

Here are the FIRST sets, built bottom-up:

    FIRST(any terminal)   = { /itself/ }
    FIRST(value)          = { identifier, number }
    FIRST(expressionTail) = { +, -, ø }
    FIRST(statement)      = { identifier, p }
    FIRST(statements)     = { identifier, p, ø }
    FIRST(declaration)    = { f, i }
    FIRST(declarations)   = { f, i, ø }
    FIRST(program)        = { f, i, identifier, p, ø }

Then we use the FIRST sets to help create the FOLLOW sets:

    FOLLOW(program)        = { $ }
    FOLLOW(declarations)   = { identifier, p, $ }
    FOLLOW(declaration)    = { f, i, identifier, p, $ }
    FOLLOW(statements)     = { $ }
    FOLLOW(statement)      = { identifier, p, $ }
    FOLLOW(expressionTail) = { identifier, p, $ }
    FOLLOW(value)          = { +, -, identifier, p, $ }


Task 2: Build the Parsing Table

Use the FIRST and FOLLOW sets to construct a parsing table for the ac grammar.

Here is the algorithm for constructing a parsing table from FIRST and FOLLOW sets.

Solution

The parsing table for this grammar is:

. floatdecl intdecl assign print plus minus identifier number $
program pgm := ds ss pgm := ds ss . pgm := ds ss . . pgm := ds ss . pgm := ds ss
declarations ds := d ds ds := d ds . ds := ø . . ds := ø . ds := ø
declaration d := f id d := i id . . . . . . .
statements . . . ss := s ss . . ss := s ss . ss := ø
statement . . . s := p id . . s := id = value tail . .
tail . . . tail := ø tail := + value tail tail := - value tail tail := ø . tail := ø
value . . . . . . value := id value := num .

In the token set for ac, I distinguished between floating-point literals and integer literals, so the number column is really two columns, one for each kind of number. This implementation detail affects only the value row, as no other non-terminal has number in its FIRST set.


Task 3: Implement the Table in Code

The final task in converting the parser to be table-driven is to write a program that implements the table-driven algorithm. The algorithm itself is straightforward enough, but it requires that we implement the parsing table as a data structure in the program. Implementing the table requires some thought.

We don't have time today for you to implement these programs in class. Instead, let me demonstrate for you how I implemented my new parser.

Demonstrate:

Here is the code for the new parser:

(As a part of the refactoring, I also converted our old ac parser into a RecursiveDescentParser class that also extends the new Parser class.)

Note that this version of the parser doesn't generate an abstract syntax tree, like the old RecursiveDescentParser did. The table-driven algorithm tells us only how to recognize a legal token sequence, but it does leave a hook for adding "other actions".


Task 4: Add Semantic Actions to the ac Grammar

Now let's augment the language grammar with semantic actions. I have already designed the abstract syntax for ac. Here is the type hierarchy:

ac AST as gif

You may even wish to refer to the Java code for these classes.

Solution: We need to build nodes for each of the leaves of the tree, as well as for the two internal nodes that can act as containers.

    program         → declarations make-declarations statements make-statements make-program
    declarations    → declaration declarations
                    |  Ø
    declaration     → floatdecl identifier make-id make-float-decl
                    |  intdecl identifier make-id make-int-decl
    statements      → statement statements
                    |  Ø
    statement       → identifier make-id assign value expressionTail make-assignment
                    |  print identifier make-id make-print-stmt
    expressionTail  → plus value make-plus-op expressionTail
                    |  minus value make-minus-op expressionTail
                    |  Ø
    value           → identifier make-id
                    |  floatNumber make-float
                    |  intNumber make-int

Notice:

The rest of the additions seem straightforward.


Task 5: Build Semantic Action Classes

If we want these actions placed on the parse stack, then the rules we retrieve from the parse table must contain suitable parse actions. And these parse actions need to push semantic actions -- not terminals and non-terminals -- onto the stack.

Write classes for the new ParseAction and the associated Semantic Actions.

Refer also to the ParseAction hierarchy from our parser:

parse actions as gif

and especially the Java code for these classes:
[ ParseAction | PushTerminal | PushNothing | PushSequence ]

Solution

We need a new kind of ParseAction to push SemanticActions onto the parse stack:

    public class PushSemanticAction implements ParseAction
    {
        private SemanticAction action;

        public PushSemanticAction( SemanticAction action )
        {
            this.action = action;
        }

        public void execute( Stack stack ) { stack.push( action ); }
        public String toString() { return " " +  action; }
     }

But what are SemanticActions? They take a semantic stack and a token and construct nodes for the abstract syntax tree. Writing this interface and the classes that implement are the real work at this step.

The SemanticAction interface resembles the ParseAction, with a second argument to its execute() method:

    public interface SemanticAction
    {
        public void   execute( Stack semanticStack, Token lastToken );
        public String toString();
    }

Then we need one class for each leaf in the AST hierarchy, along with classes for the composites Declaration, Statement, and Program, that create the corresponding AST node. Remember, these objects will get the information they need from the semantic stack and from the last token matched. Here are implementations:
[ MakeProgram | MakeDeclarations | MakeStatements | MakeFloatDeclaration | MakeIntDeclaration ]
[ MakePrintStatement | MakeAssignmentStatement | MakePlusOperation | MakeMinusOperation ]
[ MakeIdentifier | MakeFloat | MakeInt ]

Note: I also had to make some changes to the AbstractSyntaxTree classes from my original implementation. I added two-place constructors to the PlusOperation and MinusOperation classes, because my ad hoc recursive descent parser built expression 'tails' right-to-left and didn't know the left operand at the time it created the node. I also added constructors that take an Identifier argument to the declaration and statement classes, because my ad hoc parser could work straight from the token in the input stream. These new methods improve the design of the AST hierarchy.


Task 6: Add Semantic Actions to the Parsing Table

Next, we add instances of these SemanticActions to the rules in our parsing table. This follows the augmented grammar generated in Task 1 above.

To do this, modify the ParseTable factory method in the TableDrivenParser class.

Solution: This turns out to be pretty easy. Wherever we added a semantic action MakeFoo to the grammar, we add the clause

new PushSemanticAction( new MakeFoo() )

to the corresponding rule in our code. The only place this causes any trouble at all is for the rules with a single parse action on the right side, which now must become PushSequence actions.

Here is an updated TableDrivenParser class.


Task 7: Modify the Parser

Finally, we get to extend our parser to handle the semantic actions that will be on its parse stack.

Refer to the updated algorithm if you want...

Solution: This is also quite straightforward, given the effort already exerted to create SemanticAction command objects. The changes to our parseProgram() follow the new algorithm directly:

    protected AbstractSyntaxTree parseProgram() throws ParseException,
                                                       LexicalException,
                                                       IOException
    {
        Stack semanticStack = new Stack();             // NEW
        Token lastToken     = null;                    // NEW

        Stack parseStack = new Stack();
        parseStack.push( NonTerminal.Program );

        do {
           Object nextSymbol = parseStack.pop();
           Token  nextToken  = source().peek();

           if ( nextSymbol instanceof Token )
           {
              if ( ((Token) nextSymbol).type() == nextToken.type() )
                 lastToken = source().nextToken();     // CHANGED
              else
                 throw new ParseException( "Expected " + (Token) nextSymbol
                                         + " saw "     + nextToken );
           }
           else if ( nextSymbol instanceof NonTerminal )
           {
              NonTerminal symbol = (NonTerminal) nextSymbol;
              ParseAction action = AcTable.lookup( symbol, nextToken );
              action.execute( parseStack );
           }
           else // ( nextSymbol instanceof SemanticAction )     // NEW
           {
              SemanticAction action = (SemanticAction) nextSymbol;
              action.execute( semanticStack, lastToken );
           }
        }
        while ( !parseStack.empty());

        return (AbstractSyntaxTree) semanticStack.pop();        // NEW
    }

... and we have a complete table-driven parser that generates abstract syntax for consumption downstream!


Abstract Syntax that Accepts Visitors

The last thing we need to do to prepare for the compiler phases that follow parsing is to modify our abstract syntax objects to accept visitors. This requires us to add an acceptVisitor() method to each AbstractSyntaxTree class. They all look the same:

    public void acceptVisitor( AstVisitor v )
    {
        v.visit( this );
    }

Then, we create an AstVisitor interface for our abstract syntax.

So that you can see how a visitor might traverse the abstract syntax tree of an ac program, I wrote a simple pretty printer as an AstVisitor, as well as a test app to drive it. Notice how each visit method in PrettyPrinter simply visits sub-expressions that are also abstract syntax trees. All visitors work this way. The visitor needs only accessors for the subexpressions and can use them to implement behavior on the AST that the implementor of the AST classes could not anticipate. And we do not need to edit or even re-compile the AST classes in order to use any new visitor that we write!


See Spot Run

The new parser plugs into the table-driven compiler framework as before.

Here is the updated acdc compiler as a tarball.

But now we can actually run the compiler and run the dc program that it creates. Try it out!

Try sample3.ac in particular. You will notice that its output dc program is different from the corresponding dc program produced by our original acdc compiler. Why?

As some of you noticed, the original recursive descent parser was right→left associative! Changing that parser to be left→right associative would take a couple of careful changes, because the knowledge of associativity is embedded in our code. Changing the associativity of our table-driven parser is much simpler, a matter of changing the placement of a semantic action in each of two rules. Expressing knowledge explicitly, whether in data or in code, almost always provides gains in terms of readability and modifiability. (This is part of the big idea we began discussing way back in Session 10.)


Midterm Exam

Yes, we have an exam next week. :-)


Eugene Wallingford ..... wallingf@cs.uni.edu ..... October 4, 2007