We are studying the implementation of an acdc compiler. The source language, ac, is a simple arithmetic language with two data types, three operators, three reserved words, and twenty-three one-character identifiers. All variables must be declared before the statements that use them. The target language, dc, is the language of a standard arithmetic calculator of the same name. It is a stack language, similar to the intermediate representations used by many compilers.
Last time, we studied the compiler's scanner and parser, along with the data structures produced by these phases, tokens and abstract syntax trees. This time, we will review the parser and then study in detail the rest of the compiler. This consists of the rest of the analysis phase -- semantic analysis -- and the simple synthesis phase, code generation.
Consider this ac program:
f a
i b
a = 9.4
b = 3
b = b + 5
a = 1 - b + a
p a
Use the grammar of the language to drive your work, or perhaps the code of the Parser class!
Question: Show the abstract syntax tree for the statement a = 1 - b + a.
Question: Show the abstract syntax tree for the rest of the program.
By the way, here is the corresponding dc program.
... recursive descent: one procedure per grammar rule, using lookahead in the token stream to select the correct production when a choice exists.
The analysis part of a compiler ensures that the program conforms to the language's definition. The various stages of analysis approach this problem at different levels.
Scanning verifies the program at the lexical level, in terms of characters and meaningful "words". Parsing verifies the program at the syntactic level, in terms of the program structures defined in the grammar. Each of these steps works locally, examining only adjacent elements as defined by the definition of tokens and the grammar of the language.
Semantic analysis verifies any element of the language definition that applies across structures, for example, consistency. We can think of it as a global validation of the program. Semantic analysis typically deals with issues of declarations, scope, types, and type-specific operations, including issues with operator overloading.
In ac, we perform semantic analysis by verifying the declaration of identifiers and verifying the restrictions placed on operations by the types of values involved. These two tasks are done by two different objects, the SymbolTableFactory and the TypeChecker.
... the DataType class
... the SymbolTable class
... the SymbolTableFactory class
... work bottom-up from leaves (actual values and identifiers with declared types) to larger structures in the program: compound operations, assignments, and the like.
... in ac, we must ensure that types are consistent or can be made so with a type conversion. What ac allows to be implicit in the source code must be made explicit in the AST. This ensures that the program "follows the rules", so that we can be sure the program's meaning exists and is unambiguous. It also makes possible the generation of target code, or improves the quality of code that can be generated.
... implementation:
... transforms an abstract syntax syntax tree, augmented by semantic analysis and with register allocations, into code in the target language.
... in acdc, we do not need to do a separate preparation step prior to code generation. There is only the dc stack, with no other memory, and even register allocation can be done locally while processing individual statements in the AST.
Public interface: ... provides generate(AbstractSyntaxTree) as its only method. It generates code for each statement in the tree.
(dc does not require variable declarations.)
The rest of the class is a set of protected methods, one for each element of the grammar. generateStatement() writes code for statements, and generateExpression() writes code for expressions. Implementation notes:
The generateStatement() and generateExpression() methods provide templates for the various kinds of target code that we need to generate. For a more complex target language, we might want to have a more sophisticated way to select and instantiate code templates.