Convert Left Recursion into Left Associative Iteration Here is an unambiguous "left recursive" grammar for a simple expression like language. E3 ::= E3 '+' 'a' // left recursion means left associative | 'a' Using this grammar, we can parse the string "a+a+a+a" only one way, as if it were "grouped" as ((a+a)+a)+a. Here is the parse tree. E3 /|\ / | \ E3 + a /|\ / | \ E3 + a /|\ / | \ E3 + a | | a Notice that, in essence, a left-associative operator needs to have a left-recursive grammar production that captures the "big stuff" on the left side of the operator and it captures a "little thing" on the right side of the operator, e.g., E ::= E '+' 'a' A right-associative operator needs to have a right-recursive grammar production that captures "big stuff" on the right side of the operator and a "little thing" on the left side of the operator, e.g., E ::= 'a' '+' E A left-recursive production causes a problem for a recursive descent parser. A left-recursive production leads to an infinitely recursive parser. Here is the pseudo code that describes the recursive decent parser for the above left-recursive grammar. void getE(tokens) { if (number_of_remaining_tokens > 1) { getE(tokens); match('+'); match('a'); } else { match('a'); } } Notice how, if there is more than one token, then a call to getE() immediately leads to another call to getE() without consuming any tokens, so the number of tokens does not get decreased, and the recursion does not stop. We must rewrite the grammar to remove the left recursion and replace the recursion with a kind of iteration. We want to show that the above grammar can be rewritten to use the Kleene star (from "regular expressions") in place of the left recursion. The new grammar needs only one production. E3 ::= 'a' ( '+' 'a' )* // * means zero-or-more Let us look at how the left recursion is removed from the original grammar to derive the new grammar. Here is a sequence of left-most derivations of sentential forms from the original left-recursive grammar. E3 E3 + a E3 + a + a E3 + a + a + a E3 + a + a + a + a a + a + a + a + a Notice how the "expression part" is moving to the left, and in each step it grows the expression by concatenating the string "a +" onto what had been to its right. That means we can think of growing a string in the language by starting with the sting "a" and then concatenating on the left as many "a +" strings as we like. This leads to the following iterative production for the language (which is not yet the production that we want) E3 ::= ( 'a' '+' )* 'a' which uses iteration (the Kleene star) in place of recursion. But we can just as reasonably look at the final string a + a + a + a + a and say that we grow this string by starting with the string "a" and then concatenating on the right(!) as many "+ a" strings as we like. Hence, we can also derive this iterative production for the language, which is the production that we want. E3 ::= 'a' ( '+' 'a' )* So the left recursion in this production E3 ::= E3 '+' 'a' | 'a' can be factored out to derive this non-recursive production. E3 ::= 'a' ( '+' 'a' )* It is easy to translate this production into a while-loop that parses the language. But there is still a problem. Let us go back to the right recursive expression language E2, E2 ::= 'a' '+' E2 // right recursion means right associative | 'a' and use the same trick to factor out the recursion. Here is a sequence of left-most derivations of sentential forms from the E2 grammar. E2 a + E2 a + a + E2 a + a + a + E2 a + a + a + a + E2 a + a + a + a + a Notice how the "expression part" is moving to the right, and in each step it grows the expression by concatenating the string "+ a" onto what had been to its left. So that means we can think of growing a string in the E2 language by starting with the sting "a" and then concatenating on the right as many "+ a" strings as we like. This leads to the following production for language E2. E2 ::= 'a' ( '+' 'a' )* But this is exactly the same production we derived for language E3! Language E2 is supposed to be right associative and language E3 is supposed to be left associative, but it they have the same grammar, how can we say which associativity they have? The answer is that we cannot. This production E ::= 'a' ( '+' 'a' )* does not define an associativity for the '+' operator. Instead, the associativity of the '+' operator will be determined by how we write the parser, not by how we write the production. When the parser code uses this production to parse an expression, we can have the parser code build a parse tree as either a left associative or a right associative parse tree. Here is the non-recursive pseudo code that parses the iterative production. void getE(tokens) { tokens.match("a"); while ( ! tokens.isEmpty() && tokens.match("+") ) { tokens.match("a"); } } This is a recognizing parser. It doesn't do anything but parse and throw an error if the list of tokens doesn't parse. Let us see how to modify this parser so that it builds a left associative expression tree. To motivate how to modify the above code, let us consider the example string "a+a+a+a+a+a". Here is the sequence of left associative expression trees that we should get as we parse "a+a+a+a+a+a" by starting with "a" and then iterating the concatenation of "+ a" on the right. string: "a" "a+a" "a+a+a" "a+a+a+a" "a+a+a+a+a" "a+a+a+a+a+a" tree: a + + + + + / \ / \ / \ / \ / \ a a + a + a + a + a / \ / \ / \ / \ a a + a + a + a / \ / \ / \ a a + a + a / \ / \ a a + a / \ a a To better see what is going on, replace the letter 'a' with distinct letters. string: "a" "a+b" "a+b+c" "a+b+c+d" "a+b+c+d+e" "a+b+c+d+e+f" tree: a + + + + + / \ / \ / \ / \ / \ a b + c + d + e + f / \ / \ / \ / \ a b + c + d + e / \ / \ / \ a b + c + d / \ / \ a b + c / \ a b Notice that as we move to the right from string to string, the expression trees grow in a very specific way. The next tree in the sequence of trees always has the previous tree as the left branch of its root. root of next tree -- > + / \ previous a tree This is the hint that we need to write the code that builds these expression trees. Be sure to carefully compare this version of getExp() to the previous version. Tree getE(tokens) { tokens.match('a'); Tree currentTree = new Tree("a"); while ( ! tokens.isEmpty() && tokens.match('+') ) { tokens.match('a'); currentTree = new Tree("+", currentTree, "a"); // left associative } return currentTree; } Follow this code as it parses the string "a+a+a+a". (INPORTANT: Really do follow this code as it parses the string.) It parses the string into a left-associative parse tree. But now modify the code this way. Tree getE(tokens) { tokens.match('a'); Tree currentTree = new Tree("a"); while ( ! tokens.isEmpty() && tokens.match('+') ) { tokens.match('a'); currentTree = new Tree("+", "a", currentTree); // right associative? } return currentTree; } Again, follow this code as it parses the string "a+a+a+a". (INPORTANT: Really do follow this code as it parses the string.) Now it parses the string into (what seems to be) a right-associative parse tree. But there's a problem. Modify the parser once again (so that it can parse strings with variables other than "a"). Tree getE(tokens) { Token tk = tokens.nextToken(); Tree currentTree = new Tree(tk); while ( ! tokens.isEmpty() && tokens.match('+') ) { tk = tokens.nextToken(); currentTree = new Tree("+", tk, currentTree); // right associative } return currentTree; } Now follow the above parser as it parses the string "a+b+c+d". You will see that it is not really parsing the expression to be right-associative. It's not even parsing the string correctly. Instead of building this right associative expression tree, "a+b+c+d" + / \ a + / \ b + / \ c d the code is builds this tree. + / \ d + / \ c + / \ b a But if you tokenize the string "a+b+c+d" from right-to-left, so the token list is ["d", "+", "c", "+", "b", "+", "a"] and then you once again follow the parser as it parses this token list, then you should get a correct, right-associative, parse tree. The last several examples show that the grammar E3 ::= 'a' ( '+' 'a' )* DOES NOT determine any associativity for the operator. It doesn't really tell us how to parse. But we can use the grammar as a guide to implement parsers for either a left-associative operator or a right-associative operator (but the right-associative parser needs a right-to-left tokenizer!). Of course, if we really want a right-associative operator, we should use the right recursive grammar E ::= 'a' '+' E | 'a' and write the recursive descent parser for this grammar, and use a left-to-right tokenizer. Question: What does the following grammar give you? Notice that this production mixes left recursion with iteration. E ::= ( E '+' )* 'a'