语法分析review学习.pptx
自上而下分析 自上而下分析一般过程 LL(1)文法 预测分析程序 -递归下降分析实现-表驱动分析实现第1页/共42页自上而下分析一般过程In the top-down parsing,we begin with the start symbol and at each step,expand one of the remaining nonterminals by replacing it with the right side of one its productions.We repeat until only terminals remain.The top-down parse create a leftmost derivation of the sentence.第2页/共42页LL(1)文法The first“L”means we scan the input from left to right;the second“L”means we create a leftmost derivation;and the 1 means one input symbol of lookahead.A grammar G is LL(1)iff whenever A u|v are two distinct productions of G,the following conditions hold:-for no terminal a do both u and v derive strings beginning with a(i.e.first sets are disjoint)-at most one of u and v can derive the empty string-if v=*then u does not derive any string beginning with a terminal in Follow(A)第3页/共42页The first set of a sequence of symbols u,written as First(u)is the set of terminals whichstart all the sequences of symbols derivable from u.A bit more formally,consider allstrings derivable from u by a leftmost derivation.If u=*v,where v begins with someterminal,that terminal is in First(u).If u=*,then is in First(u).First 集第4页/共42页Follow 集The follow set of a nonterminal A is the set of terminal symbols that can appear immediately to the right of A in a valid sentential form.A bit more formally,for every valid sentential form S=*uAv,where v begins with some terminal,that terminal is in Follow(A).第5页/共42页计算First 集To calculate First(u)where u has the form X1X2.Xn,do the following:1.If X1 is a terminal,then add X1 to First(u),otherwise add First(X1)-to First(u).2.If X1 is a nullable nonterminal,i.e.,X1=*,add First(X2)-to First(u).Furthermore,if X2 can also go to ,then add First(X3)-and so on,through all Xn until the first nonnullable one.3.If X1X2.Xn=*,add to the first set.第6页/共42页计算Follow 集For each nonterminal in the grammar,do the following:1.Place#in Follow(S)where S is the start symbol and#is the inputs right endmarker.The endmarker might be end of file,it might be newline,it might be a special symbol,whatever is the expected end of input indication for this grammar.We will typically use#as the endmarker.2.For every production A uBv where u and v are any string of grammar symbols and B is a nonterminal,everything in First(v)except is placed in Follow(B).3.For every production A uB,or a production A u Bv where First(v)contains (i.e.v is nullable),then everything in Follow(A)is added to Follow(B).第7页/共42页预测分析程序Predictive parser is a non-backtracking top-down parser.A predictive parser is characterized by its ability to choose the production to apply solely on the basis of the next input symbol and the current nonterminal being processed.To enable this,the grammar must take a particular form,that is,a grammar LL(1).第8页/共42页递归下降分析技术The first technique for implementing a predictive parser is called recursive-descent.A recursive descent parser consists of several small functions(procedures),one for each nonterminal in the grammar.As we parse a sentence,we call the functions(procedures)that correspond to the left side nonterminal of the productions we are applying.If these productions are recursive,we end up calling the functions recursively.第9页/共42页表驱动分析技术 Another method for implementing a predictive parser is called table-driven parsing that uses a table,called parse table,to store that production along with an explicit stack to keep track of where we are in the parse.We push the start symbol on the stack and read the first input token.As the parser works through the input,there are the following possibilities for the top stack symbol X and the input token nonterminal a:(1)If X=a and a=end of input(#):parser halts and parse completed successfully.(2)If X=a and a!=#:successful match,pop X and advance to next input token.This is called a match action.(3)If X!=a and X is a nonterminal,pop X and consult table at X,a to see which production applies,push right side of production on stack.This is called a predict action.(4)If none of the preceding cases applies or the table entry from step 3 is blank,there has been a parse error第10页/共42页分析表的构造1.For each production A u of the grammar,do steps 2 and 32.For each terminal a in First(u),add A u to MA,a3.If in First(u),(i.e.A is nullable)add A u to MA,b for each terminal b in Follow(A),If in First(u),and#is in Follow(A),add A u to MA,#4.All undefined entries are errors第11页/共42页自下而上分析 自上而下分析一般过程 移进归约模式 LR 分析技术 补充:算符优先分析技术第12页/共42页自下而上分析一般过程A bottom-up parse works in reverse.We begin with the sentence of terminals and each step applies a production in reverse,replacing a substring that matches the right side with the nonterminal on the left.We continue until we have substituted our way back to the start symbol.If you read from the bottom to top,the bottom-up parse prints out a rightmost derivation of the sentence.第13页/共42页移进归约模式Bottom-up parsing algorithms are in general more powerful than top-down methods,but not surprisingly,the constructions required in these algorithms are also more complex.It is difficult to write a bottom-up parser by hand for anything but the most trivial of grammars,but fortunately,there are excellent parser generator tools like yacc that build a parser from an input specification.Shift-reduce parsing is the most commonly used and most powerful of the bottom-up techniques.第14页/共42页LR parsers(“L”for left to right scan of input;“R”for rightmost derivation)are efficient,tabledriven shift-reduce parsers.An LR parser uses two tables:1.The action table Actions,a tells the parser what to do when the state on top of the stack is s and terminal a is the next input token.The possible actions are to shift a state onto the stack,to reduce the handle on top of the stack,to accept the input,or to report an error.2.The goto table Gotos,X indicates the new state to place on top of the stack after a reduce of the non-terminal X while state s is on top of the stack.The two tables are usually combined,with the action table specifying entries for terminals,and the goto table specifying entries for non-terminals.LR技术第15页/共42页There are three types of LR parsers:LR(k),simple LR(k),and lookahead LR(k)(abbreviated to LR(k),SLR(k),LALR(k).The k identifies the number of tokens of lookahead.We will usually only concern ourselves with 0 or 1 tokens of lookahead,but it does generalize to k 1.The different classes of parsers all operate the same way(as shown above,being driven by their action and goto tables),but they differ in how their action and goto tables are constructed,and the size of those tables.Types of LR parsers第16页/共42页LR(0)configuration or item.A configuration is a production of the grammar with a dot at some position on its right side.1.Construct C=I0,I1,.In,the collection of configurating sets for G.2.State i is determined from Ii.The parsing actions for the state are determined as follows:a)If A u is in Ii then set actioni,a to reduce A u for all input.(A not equal to S).b)If S S is in Ii then set actioni,#to accept.c)If A uav is in Ii and GO(Ii,a)=Ij,then set actioni,a to shift j(a is a terminal).3.The goto transitions for state i are constructed for all non-terminals A using the rule:If GO(Ii,A)=Ij,then goto i,A=j.4.All entries not defined by rules 2 and 3 are errors.5.The initial state is the one constructed from the configurating set containing S S.Constructing LR(0)parsing tables第17页/共42页A grammar is LR(0)if the following two conditions hold for each configurating set:For any configurating set containing the item A uxv there is no complete item B w in that set.In the tables,this translates to no shift-reduce conflict on any state.There is at most one complete item A u in each configurating set.This translates to no reduce-reduce conflict on any state.LR(0)grammar第18页/共42页In SLR(1),the S stands for Simple.SLR(1)parser uses the same LR(0)configurating sets and has the same table structure and parser operation,.The difference comes in assigning table actions,where we are going to use a token of lookahead to help arbitrate among the conflicts.If the string on top of the stack could be reduced to the non-terminal A,what tokens do we expect to find as the next input?What tokens would tell us that the reduction is not appropriate?Answer is to use Follow(A).SLR(1)分析第19页/共42页1.Construct F=I0,I1,.In,the collection of LR(0)configurating sets for G.2.State i is determined from Ii.The parsing actions for the state are determined as follows:a)If A u is in Ii then set actioni,a to reduce A u for all a in Follow(A)(A may not be S).b)If S S is in Ii then set actioni,#to accept.c)If A uav is in Ii and succ(Ii,a)=Ij,then set actioni,a to shift j(a must be a terminal).3.The goto transitions for state i are constructed for all non-terminals A using the rule:If GO(Ii,A)=Ij,then goto i,A=j.4.All entries not defined by rules 2 and 3 are errors.5.The initial state is the one constructed from the configurating set containing S S.SLR(1)table construction第20页/共42页A grammar is SLR(1)if the following two conditions hold for each configurating set:1.For any item A uxv in the set,with terminal x,there is no complete item B w in that set with x in Follow(B).In the tables,this translates no shift-reduce conflict on any state.This means the successor function for x from that set either shifts to a new state or reduces,but not both.2.For any two complete items A u and B v in the same configurating set,the follow sets must be disjoint,e.g.Follow(A)Follow(B)is empty.This translates to no reduce reduce conflict on any state.If more than one non-terminal could be reduced from this set,it must be possible to uniquely determine which using only one token of lookahead.SLR(1)grammars第21页/共42页The SLR(1)technique still leaves something to be desired,because we are not using all the information that we have at our disposal.When we have a completed configuration(i.e.dot at the end)such as X u,we know this corresponds to a situation in which we have u as a handle on top of the stack which we then can reduce,i.e.,replacing u by X.We allow such a reduction whenever the next symbol is in Follow(X).However,it may be that we should not reduce for every symbol in Follow(X),because the symbols below u on the stack preclude u being a handle for reduction in this case.In other words,SLR(1)states only tell us about the sequence on top of the stack,not what is below it on the stack.We may need to divide an SLR(1)state into separate states to differentiate the possible means by which that sequence has appeared on the stack.By carrying more information in the state,it will allow us to rule out these invalid reductions.The limitations of SLR(1)第22页/共42页LR or canonical LR parsing incorporates the required extra information into the state by redefining configurations to include a terminal symbol as an added component.LR(1)configurations have the general form:A X1.Xi Xi+1.Xj,aThis means we have states corresponding to X1.Xi on the stack and we are looking to put states corresponding to Xi+1.Xj on the stack and then reduce,but only if the token following Xj is the terminal a.a is called the lookahead of the configuration.The lookahead only comes into play with LR(1)configurations with a dot at the right end:A X1Xj,aLR or canonical LR parsing第23页/共42页1.Construct F=I0,I1,.In,the collection of configurating sets for the augmented grammar G(augmented by adding the special production S S).2.State i is determined from Ii.The parsing actions for the state are determined as follows:a)If A u,a is in Ii then set Actioni,a to reduce A u(note A may not be S).b)If S S,$is in Ii then set Actioni,$to accept.c)If A uav,b is in Ii and succ(Ii,a)=Ij,then set Actioni,a to shift j(a must be a terminal).3.The goto transitions for state i are constructed for all non-terminals A using the rule:If succ(Ii,A)=Ij,then Goto i,A=j.4.All entries not defined by rules 2 and 3 are errors.5.The initial state is the one constructed from the configurating set containing S S.Constructing LR(1)parsing tables第24页/共42页A grammar is LR(1)if the following two conditions are satisfied for each configurating set:1.For any item in the set A uxv,a with x a terminal,there is no item in the set of the form B u,x In the action table,this translates no shift-reduce conflict for any state.4.The lookaheads for all complete items within the set must be disjoint,e.g.set cannot have bothA u,a and B v,a This translates to no reduce-reduce conflict on any state.LR(1)grammars第25页/共42页Because a canonical LR(1)parser splits states based on differing lookahead sets,it typically has many more states that the corresponding SLR(1)or LR(0)parser.Potentially it could require the Cartesian product of all possible LR(0)configurations and the power set of the terminals.It never actually gets that bad in practice,but a canonical LR(1)parser for an ordinary programming language might have an order of magnitude more states than an SLR(1)parser.Is there something in between?With LALR(lookahead LR)parsing,we attempt to reduce the number of states in an LR(1)parser by merging similar states.This reduces the number of states to the same as SLR(1),but still retains some of the power of the LR(1)lookaheads.The motivation for LALR第26页/共42页The“brute-force”method1.Construct all canonical LR(1)states.2.Merge those states that are identical if the lookaheads are ignored,i.e.two states being merged must have the same number of items and the items have the same core(i.e.the same productions,differing only in lookahead).The lookahead on merged items is the union of the lookahead from the states being merged.3.The GO function for the new LALR(1)state is the union of the GO function of the merged states.If the two configurations have the same core,then the original successors must have the same core as well,and thus the new state has the same successors.4.The action and goto entries are constructed from the LALR(1)states as for the canonical LR(1)parser.LALR table construction第27页/共42页几种文法之间的关系第28页/共42页1.文法的关系,不是语言的关系.2.没有一个二义文法是LL(1)或 LR(1)的.3.LR家族的关系是非常清楚的.4.每个 LL(1)文法是 LR(1)的.几种文法之间的关系第29页/共42页Implementation.Simplicity:Generality:Grammar conditioning:Error repair:Table sizes:Comparing LL(1)parsers to LALR(1)第30页/共42页解决How do real compilers do parsing?第31页/共42页其他 二义性处理第32页/共42页A different way of resolving ambiguityAmbiguity in a grammar means we have two or more leftmost derivations for the same input string,or equivalently,that we can build more than one parse trees for the same input string.Common examples:E E+E|E*E|(E)|idS iSeS|iS|a1.by re-writing the grammar to introduce new intermediatenon-terminals that force the precedence that we desired.2.try to build an SLR(1)table for this grammar.3.try to build an LL(1)table for this grammarParsing miscellany第33页/共42页解决二义性的不同途径1 重写文法2 LR分析3 LL分析第34页/共42页文法途径1 根据算符优先性和结合性重写文法G(E):E E+E|E*E|(E)|id 重写 E E+T|T T T*F|F F(E)|idG(S):S if Expr then Stmt else Stmt|if Expr then Stmt|Other brevity as:S iSeS|iS|a重写S M|U M iMeM|aU iS|iMeS第35页/共42页LR方法应