Programming Assignment 5
CS 51530
Programming Languages, Interpreters and Compilers
Spring, 2021

This assignment makes use of files contained in this zip file.

This assignment is due Monday, April 26.

This assignment is based on Language_7. In this assignment you will add a static analyzer to Language_7.

In the interpreter for Language_7, there are 38 places in the code where the interpreter does a "runtime check" for some condition that should or shouldn't hold (search Evaluate.java for the string "// runtime check"). Runtime checks slow the interpreter down, and an important aspect of programming languages is to try and minimize the number of runtime checks that an interpreter needs. Many of the runtime checks in the Language_7 interpreter are due to two aspects of our code. We do not have a proper parser and we do not have a static analyzer. In this assignment you will write a static analyzer for Language_7. This will eliminate the need for many of the runtime checks in Language_7's interpreter (the runtime checks that remain in the interpreter are almost all runtime type checks).

In Language_7, the abstract syntax tree for a program is build by the file ParseTree.java. But this is not really a parser for Language_7, it is a parser for Language_0, the "tree language" from the very beginning of the semester. Language_7 is a sub-language of Language_0 (so Language_7 is a tree language) so ParseTree.java can parse Language_7. But ParseTree.java knows nothing about the details of language_7, like the fact that an if-expression must have three branches, or a function definition must include a lambda expression. In Language_7 these details are checked at runtime by the interpreter. A proper parser for Language_7 would eliminate the need for these kinds of runtime checks in the interpreter.

A proper parser eliminates a lot of runtime checks in the interpreter, but not all of them. The kind of languages we are parsing are called context-free languages. When a language is "context-free", that roughly means that the correctness of a piece of the language, like a var-expression or a set-expression, does not depend on where in the program that piece appears. So

      (set x y)

is syntactically correct. But we have a language rule that you cannot use a variable before it is declared. A rule like that is a "context rule," and context rules cannot be determined by context-free grammars and so they cannot be enforced by a parser. In Language_7, context rules like this are enforced by runtime checks in the interpreter. It turns out that some violations of these kinds of context rules can be determined "statically" by a "semantic analyzer". We say "statically" (as apposed to "dynamically") because we analyze the program without needing to run (or execute) it. And we say "semantic" because we are checking rules that enforce aspects of the meaning of a program (like, "a variable does not exist before it has been declared" or "a function application should have the same number of arguments as the function's definition has parameters").

In this assignment you will write a static analyzer for language_7 that catches most of the syntax errors that a parser would detect and also catches many of the semantic errors that a parser cannot detect.

In the zip file the file StaticAnalysis.java outlines the code needed to perform syntactic and static analysis of a program from Language_7. You need to complete StaticAnalysis.java so that it can find all errors contained in the files Language_7_Test_Syntax_Errors.java and Language_7_Test_Semantic_Errors.java. The file Language_7_Test_Syntax_Errors.java tests for the context free errors that could be found by a parser. The file Language_7_Test_Semantic_Errors.java tests for context sensitive errors that cannot be found by a parser. The errors in Language_7_Test_Syntax_Errors.java are easier to detect and the code needed to check for these errors can mostly be copied from the (previous) interpreter for Language_7. The errors in Language_7_Test_Semantic_Errors.java are harder to detect and the code needed to check for these errors can be modeled on, but not necessarily copied from, the code in the (previous) Language_7 interpreter.

A common programming error is to reference a variable that has not been declared (for example, you might misspell a variable name when you are trying to reference it). Another is to re-declare a variable in the same scope. An interpreter is not a good tool for finding these kinds of errors. First of all, there are parts of the program text that the interpreter might rarely see. For example, the interpreter may not traverse the else-clause of an if-expression because the else-clause is meant to handle a rarely occurring error condition. You can check by using the Language_7 REPL that the following expression will evaluate without error since the undeclared variable y does not get looked at (change true to false and the expression will cause a runtime error).

      (var x (if true 1 y))

A second reason an interpreter is bad for finding undeclared (and re-declared) variables is that the interpreter may not find one until after the program has been running for a long time. The resulting error, and the halting of the interpreter, will find the undeclared or re-declared variable, but at the cost of much wasted time.

In a statically scoped language, it is possible to find all undeclared variables in a program without having to run the program (this is not true for dynamically scoped languages; for those languages, you must execute the program to find undeclared variables). One job of your StaticAnalysis.java program is to find all undeclared variable references (and function names) in a program and all re-declarations of previously declared variables (and function names).

In the zip file the file StaticAnalysis.java outlines the code needed to perform syntactic and static analysis of a program from Language_7. You need to complete StaticAnalysis.java so that it can find all references to undeclared variables, all re-declarations of existing variables (within the same scope as the original declaration), all undeclared function names, all re-declarations of (global) function names, and all repeated parameters in a lambda expression. In addition, your program should check the agreement between the number of actual parameters and the number of formal parameters when an apply-expression applies a globally defined function name defined by fun. For example, the following code has an incorrect application of function h, but your static analyzer need not detect this error (the interpreter will detect it at runtime). On the other hand, your static analyzer should detect the incorrect application of function f.

      (prog
         (fun f (lambda x (* x x)))
         (var h f)
         (apply h 1 2)) // wrong number of arguments for h
         (apply f 1 2)  // wrong number of arguments for f

Be careful about an example like the following code. The function applications use global function names that have local redefinitions that hide the global definitions. In this case, one function application has the correct number of arguments and the other one doesn't. Your code does not need to check either of these function applications (the interpreter will check them at runtime). The Environment class has methods defined() and definedGlobal() that you can use to distinguish which function applications you need to check. (Interesting question: How do the C, C++ and Java compilers catch errors like this?)

      (prog
         (fun f (lambda x   (* x x)))
         (fun g (lambda x y (* x y)))
         (begin
           (var temp f)    // swap f and g
           (var f g)
           (var g temp)
           (apply f 1 2))  // correct number of arguments
           (apply g 1 2))) // incorrect number of arguments

It is very useful to notice that the StaticAnalysis.java program is structured the same way as the interpreter, Evaluate.java. They use a "syntax driven" structure with one method for every production in the grammar (so the grammar of the language becomes a table of contents to each of these program's code). The methods in the interpreter have names like evaluateApply() and evaluateWhile(). The methods in StaticAnalysis.java have names like analyzeApply() and analyzeWhile().

The analysis methods in StaticAnalysis.java are all boolean methods. Each method should print, to stdout, an appropriate error message whenever it finds a syntax or semantic error (the analysis methods should not throw any exceptions). An analysis method should return true if it does not find any errors and it should return false if finds at least one error. Whenever an analysis method detects an error, it should print an error message and then continue to analyze the remaining code (similar to how a compiler does not stop when it finds a compile time error). Notice that a single call to an analysis method could find several errors. For example, if both x and y are undeclared variables, then a call to analyzeSet() for this code,

      (set x y)

should produce two "undeclared variable" error messages, one for each variable. Similarly, if x is already declared and y is undeclared, then a call to analyzeVar() for this code,

      (var x y)

should produce two error messages, a "previously declared variable" error message for x and an "undeclared variable" error message for y. In the zip file there is a test program, Language_7_Test_Semantic_Errors.java, with its output, that demonstrates a large number of semantic errors and their error messages.

Most of the code for StaticAnalysis.java can be found in Evaluate.java from Language_7. The static analyzer acts very much like an interpreter, but it doesn't need to fully evaluate a program. For example, the method analyzeWhile() needs to analyze each expression in the body of the while-loop, but it does not need to execute the loop. As another example, the static analyzer needs to use global and local Environment objects to keep track of which variables have been declared, but the analyzer does not need to keep track of the value of any variable (except for function names defined by fun). In my code, whenever my static analyzer needs to put a variable into an Environment object, I always give the variable the value new Value(false) (except for function names defined by fun).

I have defined a few of the methods for you in StaticAnalysis.java. I suggest that you start with the arithmetic and boolean productions; these are the easiest. Then write (and test) other fairly simple productions like If, While and Print. Then work on the important Var and Set productions. These need to use environments, so they are more complicated that the previous productions. Write (and test) the code for each of those two productions. Leave the hardest productions, Apply, Fun, Lambda, for last.

The biggest difference between StaticAnalysis.java and Evaluate.java is that the body of a function definition is analyzed as part of the analysis of the function declaration, not as part of a function application. So the code for analyzeFun() looks a lot like the code from evaluateApply() (and the code for analyzeApply() is actually similar to analyzeSet()).

In the zip file there is an interactive REPL of Language_7, called Language_7_demo.jar, with all the static analysis enabled. You can run the REPL by double clicking on the file Language_7_demo.cmd. You can also run Language_7 script files by dragging and dropping them onto Language_7_demo.cmd. Experimenting with the demo version of the REPL will give you a good idea of what kind of syntax and semantic errors your code should be able to detect. When your version of StaticAnalysis.java is complete, you can build your own REPL by compiling the file Language_7.java.

Look at the beginning of each of the class files Evaluate.java and StaticAnalysis.java and you will see that they each declares a global Environment object. The analyzer uses its global environment to keep track of declared variables and functions. The evaluator uses its global environment object to keep track of the state of a running program. One tricky aspect of this code is that the REPL needs to instantiate both a global analysis environment and a global interpreter environment. When you type an expression into the REPL, it first uses the global analysis environment (with the analyzer) to analyze the expression, and then if the expression passes its analysis, the REPL uses the global interpreter environment to evaluate the expression. This trickiness is taken care of for you in the REPL code from Language_7.java.

When you run the test programs (or the demo program) you will see that the error messages contain line numbers. This is so that the error messages are unambiguous. Notice in the Token.java file that each token contains its line number and character position from the original source string. Notice in the Tree.java file that each tree node in the AST holds a token. To get a reference to the token in a tree node, call the node's getToken() method. To get the string in a tree node, you can either call the node's getElement() method or you can access the token's lexeme field.

Turn in a zip file called CS51530Hw5Surname.zip (where Surname is your last name) containing your version of StaticAnalysis.java. Be sure to put your name and email address in every file your turn in.

This assignment is due Monday, April 26.