Programming Assignment 5
CS 51530
Programming Languages, Interpreters and Compilers
Spring, 2020

This assignment makes use of files contained in this zip file.

This assignment is due Monday, April 20.

This assignment is based on Language_7-with-tokenizer. In this assignment you will add a parser and a static analyzer to Language_7.

In the interpreter for Language_7, there are 27 places in the code where the interpreter does a "runtime check" for some condition that should or shouldn't hold (search Evaluate.java for the string "// runtime check"). Runtime checks slow the interpreter down, and an important aspect of programming languages is to try and minimize the number of runtime checks that an interpreter needs. Many of the runtime checks in the Language 7 interpreter are due to two aspects of our code, we do not have a proper parser and we do not have a static analyzer. In this assignment you will write both a parser and a static analyzer for Language 7. This will eliminate the need for many of the runtime checks in Language_7's interpreter.

In Language_7, the abstract syntax tree for a program is build by the file BuildTree.java. But this is not really a parser for Language 7, it is a parser for Language 0, the "tree language" from the very beginning of the semester. Language 7 is a sub-language of Language 0 (so Language 7 is a tree language) so BuildTree.java can parse Language 7. But BuildTree.java knows nothing about the details of language 7, like the fact that an if-expression must have three branches, or a function definition must include a lambda expression. In Language_7 these details are checked at runtime by the interpreter. A proper parser for Language 7 would eliminate the need for these kinds of runtime checks in the interpreter.

In the zip file there is a file called Parse2AST.java that outlines the structure of a recursive descent parser for Language 7. Complete this file so that it parses any Language 7 program into its abstract syntax tree and also so that it catches as many parsing errors as you can think of. In the zip file there is a test program, Test_Parser_Errors.java, that you can run that causes a large number of parsing errors. Use those test cases as a guide to what kind of parse errors you should look for.

A proper parser eliminates a lot of runtime checks in the interpreter, but not all of them. The kind of languages we are parsing are called context-free languages. When a language is "context-free", that roughly means that the correctness of a piece of the language, like a var-expression or a set-expression, does not depend on where in the program that piece appears. So

      (set x y)

is syntactically correct. But we have a rule that you cannot use a variable before it is declared. A rule like that is a "context rule," and context rules cannot be determined by "context-free" grammars and so they cannot be enforced by a parser. In Language_7, context rules like this are enforced by runtime checks in the interpreter. It turns out that some violations of these kinds of context rules can be determined "statically" by a "semantic analyzer". We say "statically" (as apposed to "dynamically") because we analyze the program without needing to run (or execute) it. And we say "semantic" because we are checking rules that enforce aspects of the meaning of a program (like, "a variable does not exist before it has been declared" or "a function application should have the same number of arguments as the function's definition has parameters").

A common programming error is to reference a variable that has not been declared (for example, you might misspell a variable name when you are trying to reference it). Another is to re-declare a variable in the same scope. An interpreter is not a good tool for finding these kinds of errors. First of all, there are parts of the program text that the interpreter might rarely see. For example, the interpreter may not traverse the else-clause of an if-expression because the else-clause is meant to handle a rarely occurring error condition. A second reason an interpreter is bad for finding undeclared (and re-declared) variables is that the interpreter may not find one until after the program has been running for a long time. The resulting error, and the halting of the interpreter, will find the undeclared or re-declared variable, but at the cost of much wasted time.

In a statically scoped language, it is possible to find all undeclared variables in a program without having to run the program (this is not true for dynamically scoped languages; for those languages, you must execute the program to find undeclared variables). The main job of your StaticAnalysis.java program is to find all undeclared variable references (and function names) in a program and all re-declarations of previously declared variables (and function names).

In the zip file the file StaticAnalysis.java outlines the code needed to perform static analysis of a program from Language 7. You need to complete StaticAnalysis.java so that it can find all references to undeclared variables, all re-declarations of existing variables (within the same scope as the original declaration), all undeclared function names, all re-declarations of function names, and all repeated parameters in a lambda expression. In addition, your program should check the agreement between the number of actual parameters and the number of formal parameters when an apply-expression applies a globally defined function name. Notice that this last condition should only be checked for function names defined by fun. For example, the following code is incorrect, but your static analyzer need not detect this error (the interpreter will detect it at runtime).

      (prog
         (fun f (lambda x (* x x)))
         (var h f)
         (apply h 1 2)) // wrong number of arguments for h

The analysis methods in StaticAnalysis.java are all boolean methods. Each method should print, to stdout, an appropriate error message whenever it finds an error (the methods should not throw any exceptions). An analysis method should return true if it does not find any errors and it should return false if finds at least one error. Whenever an analysis method detects an error, it should print an error message and then continue to analyze the remaining code (similar to how a compiler does not stop when it finds a compile time error). Notice that a single call to an analysis method could find several errors. For example, if both x and y are undeclared variables, then a call to analyzeSet() for this code,

      (set x y)

should produce two "undeclared variable" error messages, one for each variable. Similarly, if x is already declared and y is undeclared, then a call to analyzeVar() for this code,

      (var x y)

should produce two error messages, a "previously declared variable" error message for x and an "undeclared variable" error message for y. In the zip file there is a test program, Test_StaticAnalyzer_Errors.java, that you can run that causes a large number of static analysis errors. Use those test cases as a guide to what kind of errors you should look for.

It is very important to notice that the Parse2AST.java and StaticAnalysis.java programs are structured the same way as the interpreter, Evaluate.java. They use a "syntax driven" structure with one method for every production in the grammar (so the grammar of the language becomes a table of contents to each of these program's code). The methods in the interpreter have names like evaluateApply() and evaluateWhile(). The methods in Parse2AST.java and StaticAnalysis.java have names like parseApply() and analyzeApply(), respectively. I have defined a few of these methods for you in each of Parse2AST.java and StaticAnalysis.java, and I started the definitions of getExp() and analyzeExp(). I suggest that you start with the Var and Set productions. Write the code for each of those two productions and then write simple test cases for them. Get that working, then go on to write (and test) other, fairly simple productions like If, While, and Print. Leave the hardest productions, like Apply and Lambda, for last.

Most of the code for StaticAnalysis.java can be found in Evaluate.java from Language_7. The static analyzer acts very much like an interpreter, but it doesn't need to fully evaluate a program. For example, the method analyzeWhile() needs to analyze each expression in the body of the while-loop, but it does not need to execute the loop. As another example, the static analyzer needs to use global and local Environment objects to keep track of which variables have been declared, but the analyzer does not need to keep track of the value of any variable (except for function names defined by fun). In my code, whenever I put a variable into an Environment object, I give the variable the value new Value(false).

The biggest difference between StaticAnalysis.java and Evaluate.java is that the body of a function definition is analyzed as part of the analysis of the function declaration, not as part of a function application. So the code for analyzeFun() looks a lot like the code from evaluateApply() (and the code for analyzeApply() is actually similar to analyzeSet()).

When you have finished implementing Parse2AST.java, the file Language_7_Examples.java should compile and run. The output from running this program should look exactly like the file Language_7_Examples_output.txt. But this test file is a test of the interpreter, Evaluate.java. It doesn't test for parsing or semantic errors (use Test_Parser_Errors.java and Test_StaticAnalyzer_Errors.jav for that). Running Language_7_Examples.java verifies that you haven't messed up the abstract syntax trees produced by your parser.

If you are having a problem with your parser and want to test your static analyzer, you can (temporarily) replace your Parse2AST.java program with the older BuildTree.java parser. They should produce the exact same abstract syntax trees. So you can use BuildTree.java to test your StaticAnalysis.java if your Parse2AST.java is not yet working.

In the zip file there is an interactive REPL of Language_7, called Language_7_demo.jar, with all the parsing and static errors enabled. You can run the REPL by double clicking on the file Language_7_demo.cmd. You can also run Language_7 script files by dragging and dropping them onto Language_7_demo.cmd. Experimenting with the demo version of the REPL will give you a good idea of what kind of parsing and static errors your code should be able to detect. When your version of Parse2AST.java and StaticAnalysis.java are complete, you can build your own REPL by compiling the file Language_7.java.

When you run the test programs (or the demo program) you will see that the error messages contain line numbers for the variables. This is so that the error messages are unambiguous. I have modified the Tokenizer.java, Tree.java, Tree2dot.java, and Evaluate.java files so that each token contains its line number and character position from the original source string and each tree node in the parse tree holds a token. Be sure to notice that Tree nodes now hold Token objects (instead of String objects). To get the string in a tree node, you need to access the token's lexeme field. Forgetting to access the lexeme field from a tree node's token will cause hard to find bugs!

Turn in a zip file called CS51530Hw5Surname.zip (where Surname is your last name) containing your versions of StaticAnalysis.java and Parse2AST.java (just those two files). Be sure to put your name and email address in every file your turn in.

This assignment is due Monday, April 20.