This assignment makes use of the files contained in this zip file. This assignment is due Tuesday, April 18.
In this assignment you will write a static analyzer for Language_4. The purpose of static analysis is to find program errors that cannot be detected by a parser.
Language_1, Language_2, Language_3, and Language_4 are all subsets of the tree-language. We write code in those languages using a tree notation, like this,
(+ 3 4 (* 5 6) (* 7 8))
which looks, more or less, like Lisp code. The grammar for the tree language is very simple. It has only one production.
Tree ::= '(' String Tree+ ')' | String
Writing a parser for the tree language is also very simple; the recursive descent parser has only one method. The grammars for Language_1, Language_2, Language_3, and Language_4 are not so simple. They have many productions, so writing parsers for those languages would be more complicated. As our languages grow more complex, so would the parsers. To avoid having to write a sequence of progressively more complicated parsers, we will instead parse all of our languages using the very simple parser for the tree language. But that means that we are not parsing, say Language_4, we are always parsing the tree language. So the following expressions will all parse to a parse tree, even though, as far as Language_4 is concerned, they all have the wrong number of operands for their operators.
(var x 1 2 3 4) (- 1 2 3) (** 2 5 6) (! true true) (<= (+ 1 2) (+ 2 3) (+ 4 5)) (if (== u v w) (var a b) (var c d) (var x y))
It might seem that we are just being lazy in not choosing to write a parser for Language_4. But it turns out that writing a parser for Language_4 would not really be worth the effort. There are several kinds of coding errors that, unlike the above errors, cannot be caught by a parser. But those other kinds of coding errors can be found by a static analyzer.
The kind of languages we are parsing are called context-free languages. When a language is "context-free" that roughly means that the correctness of a piece of code, like a set-expression or an if-expression, does not depend on where in the program that piece appears. So
(set x 1)
is syntactically correct. But we have a language rule that says we cannot use a variable before it is declared. Now consider this simple program.
(prog (set x 1) (var x 0) (print y))
This program assigns a value to a variable that has not been declared, and the program references another variable that has also not been declared. A careful analysis of this (incorrect) program will show that it does parse using the grammar of Language_4. No context-free grammar is capable of enforcing a "context-sensitive" rule like "a variable must be declared before it can be used". So writing a detailed parser for Language_4 would still leave us with this erroneous program parsing correctly.
Here is another example of a coding error that cannot be caught by a parser but can be found by a static analyzer. The following expressions all have a wrong operand type for an operator.
(+ 1 false) (! 23) (> true 0)
It is difficult to write a parser that enforces the correct type for all the operands of an operator. But a static analyzer can look at the operands in the code's abstract syntax tree and see that some of the operands have the wrong type.
In the implementation of Language_4 from class, all of the language's context-sensitive rules are enforced by runtime checks in the tree-walking evaluator as it traverses the code's abstract syntax tree. But this is a bad strategy for finding program errors, for two reasons. First, the tree-walking evaluator will not find a problem until it gets to the part of the tree that holds the incorrect code. It might take the evaluator a long time to execute the correct parts of a program before it gets to an incorrect part. Then, when the evaluator gets to the incorrect code, the program will halt with an error message, but only after having wasted a lot of time.
Second, there are parts of the program text that the evaluator might rarely see. For example, the evaluator may not traverse the else-clause of an if-expression because the else-clause is meant to handle a rarely occurring error condition. This means that a program might be run many times before it is even realized that the program contains a mistake (and we really don't want to find out that there is a bug in the error handling code just when the error condition finally happens for the first time). You can see a simple example of this using the Language_4 REPL. The following expression will evaluate without error in the REPL because the undeclared variable y does not get looked at by the evaluator. But change true
to false
and the expression will cause a runtime error.
(var x (if true 1 y))
Both versions of the above expression would be flagged by a static analyzer as containing an undeclared variable.
A "static analyzer" is a program that traverses an abstract syntax tree (AST) and looks for the kinds of program errors described above. A static analyzer sits between a parser and an evaluator.
tokenizer parser analyzer evaluator +------+ +-------+ +-------+ +-------+ +--------+ |source|-------->|list of|----->|tree of|------->|correct|-------->| result | | code | |tokens | |tokens | |tree of| | | | | | | | | |tokens | | | +------+ +-------+ +-------+ +-------+ +--------+
A static analyzer is a kind of simplified tree-walking evaluator. It traverses the abstract syntax tree (like an evaluator) but the analyzer doesn't need to do as much work as an evaluator. For example, an analyzer doesn't iterate any while-loops. It just checks the parts of a while-loop to see if they are free of errors. If the static analyzer finds an error in the AST, it reports an error message and then throws an exception to prevent the evaluator from traversing the AST.
One benefit of static analysis is that the tree-walking evaluator program can become much simpler and faster. The tree-walking evaluator can safely assume that the AST it is traversing has no errors in it, so the evaluator can be written without any runtime checks. Consider a while-loop like this.
(var x 0) (while (!= x 1000000) (begin (print x) (+ x 1)))
The evaluator for Language_4 will do one million type checks of (!= x 1000000)
to make sure that it has a boolean value. But the AnalyzeAST program for Language_4a can determine that this loop is well formed (without iterating anything), and then the evaluator for Language_4a does not need to do any type checking, eliminating one million type checks.
You can compare Language_4's evaluator, EvaluateAST.java
, with Language_4a's evaluator, EvaluateAST_no_checking.java
, and you will see that Language_4a's evaluator is considerable simpler and 20% shorter than Language_4's.
In the zip file there is a package language4a
that contains the file AnalyzeAST.java
that is supposed to perform static analysis of programs from Language_4a
. You need to complete this program so that it can find all syntax and type errors mentioned above. In particular, the
analyzer should detect:
Read the file Language_4a.txt
for details about the differences between Language_4a and Language_4.
The AnalyzeAST.java
program for Language_4a is structured the same way as the EvaluateAST.java
program for Language_4. They use a "syntax driven" structure with one method for every production in the grammar (so the grammar of the language becomes a table of contents to the program's code). The methods in AnalyzeAST.java
have names like checkExpr()
in place of names like evaluateExpr()
from the evaluator.
Most of the code needed for Language_4a's AnalyzeAST.java
can be found in Language_4's EvaluateAST.java
program. The static analyzer acts very much like a tree-walking evaluator, but it doesn't need to fully evaluate a program. For example, the static analyzer needs to use global and local Environment
objects to keep track of which variables have been declared (or re-declared) and what their types are, but the analyzer does not need to keep track of the correct value of any variable (in my code, whenever I put a variable into an Environment
object, I give a boolean variable the value "new Value(false)
" and integer variables the value "new Value(0)
"). As another example, the method checkWhile()
needs to analyze each expression in the while-loop, but it does not need to iterate any loops. And the method checkIf()
needs to analyze all three expressions in the if-expression and make sure that the last two expression have the same type.
The analysis methods in AnalyzeAST.java
all have a return type of Value
which represents the type returned by the expression being analyzed. An analysis method should return "new Value(false)
" if the expression being analyzed has the boolean type and it should return "new Value(0)
" if the expression being analyzed has the integer type. Each analysis method should print, to stdout, an appropriate error message whenever it finds an error. The analysis methods should not throw any exceptions; only the checkProg()
method throws an AnalysisError
. Notice that AnalyzeAST.java
has a static boolean variable thereIsAnError
. Whenever an analysis method detects an error, it should set thereIsAnError
to true
, then print its error message, and then continue to analyze the remaining code (similar to how a compiler does not stop when it finds a compile time error). Notice that a single call to an analysis method could find several errors. For example, if both x
and y
are undeclared variables, then a call to analyzeSet()
for this code,
(set x y)
should produce two "undeclared variable" error messages, one for each variable. Similarly, if x
was previously declared in the current scope and y
is undeclared, then a call to analyzeVar()
for this code,
(var x y)
should produce two error messages, a "previously declared variable" error message for x
and an "undeclared variable" error message for y
.
Carefully read the code in Language_4's EvlauateAST.java
. Understanding that code is the main prerequisite for completing this assignment. If you understand EvlauateAST.java
, it should be fairly easy to modify its code into the code needed by Language_4a's AnalyzeAST.java
.
When you have finished implementing AnalyzeAST.java
, the files Language_4a_Examples.java
and Language_4_Examples.java
should compile and run. The output from running these programs should look exactly like the files Language_4a_Examples_output.txt
and Language_4_Examples_output.txt
.
You can also compile the program Language_4a.java
which implements a REPL for Language_4a. And the folder Language_4a_demo
contains a demo version of Language_4a.
Your code should compile without any warnings. These are programs that use Java generics. Getting generic code to compile without warnings can be tricky. But the warnings always mean that you are doing something wrong (they are really errors, not warnings). So try to figure out what is causing each warning (it is usually because you are either forgetting a type parameter or using the wrong type parameter). Ask me questions if you get stuck.
Do not change the package structure of this code. The tree implementation is in the tree
package. The tokenizer is in the tokenizer
package. The tree language parser is in the treelanguage
package. The AnalyzeAST.java
program is in the language4a
package. The Language_4a.java
interpreter and the test programs are in the "default" (unnamed) package. Don't change that. Do not put those files into some named package.
Turn in a zip file called CS316Hw3Surname.zip
(where Surname
is your last name) containing only your version of AnalyzeAST.java
. Please be sure to put your name in your file.
This assignment is due Tuesday, April 18.