《词法分析部分总结.ppt》由会员分享,可在线阅读,更多相关《词法分析部分总结.ppt(80页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、词法分析部分总结词法分析部分总结词法分析部分总结词法分析部分总结田田 聪聪11.描述描述:用正规式对模式进行描述;2.构造构造NFA:为每个正规式构造一个NFA;3.确定化确定化:将NFA转换成等价的DFA;4.最小化最小化:优化DFA,使其状态数最少;5.构造词法分析器构造词法分析器:由DFA构造词法分析器(表驱动,直接编码,LEX)。构造词法分析器的一般方法和步骤构造词法分析器的一般方法和步骤2涉及到的形式化概念涉及到的形式化概念涉及到的形式化概念涉及到的形式化概念F正则表达式(正规式)正则表达式(正规式)FNFAFDFA正则表达式例如:Char(char|digit)*,字符串,abbA
2、111,.NFADFA正则语言(正规集)3正则语言正则语言正则语言正则语言正则语言上下文无关语言上下文有关语言递归可枚举语言4正则语言正则语言正则语言正则语言F正则语言正则语言正则表达式有限状态自动机F上下文无关语言上下文无关语言上下文无关文法非确定的下推自动机F上下文有关语言上下文有关语言上下文有关文法线性有界自动机(特殊的图灵机)F递归可枚举语言递归可枚举语言短语结构图灵机5Algebraic Laws for REsAlgebraic Laws for REsFUnion and concatenation behave like addition and multiplication.
3、+is commutative(可交换的)and associative(可结合的)a+b=b+a,a+b+c=a+(b+c)concatenation is associative(可结合的)a.b.c=a.(b.c)Concatenation distributes over+(可分配的)a.(b+c)=a.b+a.cException:Concatenation is not commutative(可交换的)a.b b.a6Identities and AnnihilatorsIdentities and AnnihilatorsF is the identity(单位元单位元)for
4、+.R+=R.F is the identity(单位元单位元)for concatenation.R=R=R.F is the annihilator(零元零元)for concatenation.R=R=.7正则语言相关内容正则语言相关内容正则语言相关内容正则语言相关内容F如何证明正则表达式和如何证明正则表达式和NFA的等价性?的等价性?(1)We need to show that for every RE,there is an automaton that accepts the same language.(2)And for every automaton,there is a
5、RE defining its language.8RE to RE to -NFA:-NFA:BasisBasisFSymbol a:F:a9RE to RE to -NFA:-NFA:Induction 1Induction 1 Union UnionFor E1For E2For E1 E210RE to RE to -NFA:-NFA:Induction 2Induction 2 Concatenation ConcatenationFor E1For E2For E1E211RE to RE to -NFA:-NFA:Induction 3Induction 3 Closure Cl
6、osureFor EFor E*12正则语言相关内容正则语言相关内容正则语言相关内容正则语言相关内容F如何证明正则表达式和如何证明正则表达式和NFA的等价性?的等价性?(1)We need to show that for every RE,there is an automaton that accepts the same language.(2)And for every automaton,there is a RE defining its language.13From Automata to REFrom Automata to REF Ardens ruleFor any se
7、ts of strings S and T,the equation X=SX+T has X=S*T as a solution.Moreover,this solution is unique if not in S.14From Automata to REFrom Automata to REF Given an automaton AF A has states q0,qn with q0 being the start stateFLet Xi denote the set of strings accepted by A starting in state qiFThus,L(A
8、)=X0F We can write an equation for each Xi,defining it in terms of the sets corresponding to its successor states.15From Automata to REFrom Automata to REq2q0q3q1b,caccaa,b,ca,bbA016From Automata to REFrom Automata to REq2q0q3q1b,caccaa,b,ca,bbA0(0)X0=aX1+bX3+cX3(1)X1=aX3+bX2+cX0+(2)X2=aX3+bX3+cX0(3
9、)X3=aX3+bX3+cX3X3=(a+b+c)X3+,by Ardens rule:X3=(a+b+c)*=(0)X0=aX1(1)X1=bX2+cX0+(2)X2=cX0Substituing(0)and(2)in(1):X1=(bc+c)aX1+=(bc+c)a)*(by Ardens rule)X0=a(bc+c)a)*17DFA-to-REDFA-to-REFAnother approach FPage 93,theorem 3.4.(形式语言与自动机)(形式语言与自动机)FInduction on k-path.18Decision Properties of Regular D
10、ecision Properties of Regular LanguagesLanguages19Properties of Language ClassesProperties of Language ClassesFA language class is a set of languages.We have one example for language class:the regular languages.任何一个正则表达式都表达了一个语言,所有的正则表达式构成了语言类:正则语言FLanguage classes have two important kinds of proper
11、ties:1.Decision properties.2.Closure properties.20Decision PropertiesDecision PropertiesFA decision property for a class of languages is an algorithm that takes a formal description of a language(e.g.,a DFA)and tells whether or not some property holds.FExample:Is language L empty?Suppose the represe
12、ntation is a DFA.Can you tell if L(A)=for DFA A?21Why Decision Properties?Why Decision Properties?FWhen we talked about protocols represented as DFAs,we noted that important properties of a good protocol were related to the language of the DFA.FExample:“Does the protocol terminate?”=“Is the language
13、 finite?”FExample:“Can the protocol fail?”=“Is the language nonempty?”22Why Decision Properties (2)Why Decision Properties (2)FWe might want a“smallest”representation for a language,e.g.,a minimum-state DFA or a shortest RE.FIf you cant decide“Are these two languages the same?”I.e.,do two DFAs defin
14、e the same language?You cant find the“smallest.”23Closure PropertiesClosure PropertiesFA closure property(封闭性封闭性)of a language class says that given languages in the class,an operator(e.g.,union)produces another language in the same class.FExample:the regular languages are obviously closed under uni
15、on,concatenation,and(Kleene)closure.(求补?求交?求补?求交?)是正是正规规式式 若若a是是上的字符,则上的字符,则a是正规式是正规式 若若r和和s分别是分别是上的正规式,那么上的正规式,那么 (a)r|s是正规式是正规式 (b)rs是正规式是正规式 (c)r*是正规式是正规式24The Membership QuestionThe Membership QuestionFOur first decision property is the question:“is string w in regular language L?(成员问题)(成员问题)”FA
16、ssume L is represented by a DFA A.FSimulate the action of A on the sequence of input symbols forming w.25ExampleExample:Testing Membership:Testing MembershipStart10ACB100,10 1 0 1 1 NextsymbolCurrent state26ExampleExample:Testing Membership:Testing MembershipStart10ACB100,10 1 0 1 1 NextsymbolCurren
17、t state27ExampleExample:Testing Membership:Testing MembershipStart10ACB100,10 1 0 1 1 NextsymbolCurrent state28ExampleExample:Testing Membership:Testing MembershipStart10ACB100,10 1 0 1 1 NextsymbolCurrent state29ExampleExample:Testing Membership:Testing MembershipStart10ACB100,10 1 0 1 1 Nextsymbol
18、Current state30ExampleExample:Testing Membership:Testing MembershipStart10ACB100,10 1 0 1 1 NextsymbolCurrent state31The Emptiness ProblemThe Emptiness ProblemFGiven a regular language,does the language contain no string at all(判空问题)(判空问题).FAssume representation is DFA.FConstruct the transition grap
19、h.FCompute the set of states reachable from the start state.FIf any final state is reachable,then yes,else no.32The Infiniteness ProblemThe Infiniteness ProblemFIs a given regular language infinite?FStart with a DFA for the language.FKey idea:if the DFA has n states,and the language contains any str
20、ing of length n or more,then the language is infinite.FOtherwise,the language is surely finite.Limited to strings of length n or less.33Proof of Proof of Key IdeaKey IdeaFIf an n-state DFA accepts a string w of length n or more,then there must be a state that appears twice on the path labeled w from
21、 the start state to a final state.FBecause there are at least n+1 states along the path.34Proof (2)Proof (2)|w|=5 s0s2s4s1s3123435Finding CyclesFinding Cycles1.Eliminate states not reachable from the start state.2.Eliminate states that do not reach a final state.3.Test if the remaining transition gr
22、aph has any cycles.36The Pumping LemmaThe Pumping LemmaFWe have,almost accidentally,proved a statement that is quite useful for showing certain languages are not regular.FCalled the pumping lemma for regular languages.37Statement of the Pumping LemmaStatement of the Pumping LemmaFor every regular la
23、nguage L There is an integer n,such that For every string w in L of length n We can write w=xyz such that:1.|xy|0.3.For all i 0,xyiz is in L.Number ofstates ofDFA for LLabels alongfirst cycle onpath labeled w38ExampleExample:Use of Pumping Lemma:Use of Pumping LemmaF用来证明一个语言不是正则语言(必要非充分条件)用来证明一个语言不是
24、正则语言(必要非充分条件)FExample:0k1k|k 1 is not a regular language.FSuppose it were.Then there would be an associated n for the pumping lemma.FLet w=0n1n.We can write w=xyz,where x and y consist of 0s,and y .FBut then xyyz would be in L,and this string has more 0s than 1s.泵引理是正则语言的必要非充分条件!一个正则语言,必须满足泵引理。如果一个语
25、言不满足泵引理,那么它肯定不是正则语言。如果它满足泵引理,它不一定是正则语言。39Pumping Lemma满足泵引理,但不是正则语言Jeffrey Jaffe(MIT)的泵引理(正则语言的必要充分条件)40Decision PropertyDecision Property:Equivalence:EquivalenceFGiven regular languages L and M,is L=M?FAlgorithm involves constructing the product DFA from DFAs for L and M.FLet these DFAs have sets o
26、f states Q and R,respectively.FProduct DFA has set of states Q R.I.e.,pairs q,r with q in Q,r in R.41Product DFA ContinuedProduct DFA ContinuedFStart state=q0,r0(the start states of the DFAs for L,M).FTransitions:(q,r,a)=L(q,a),M(r,a)L,M are the transition functions for the DFAs of L,M.That is,we si
27、mulate the two DFAs in the two state components of the product DFA.42ExampleExample:Product DFA:Product DFAACBD010,11100A,CA,D0B,C10101B,D0143Equivalence AlgorithmEquivalence AlgorithmFMake the final states of the product DFA be those states q,r such that exactly one of q and r is a final state of i
28、ts own DFA.F若等价,一个接收,另一个也接收!若等价,一个接收,另一个也接收!FThus,the product accepts w iff w is in exactly one of L and M.44ExampleExample:Equivalence:EquivalenceACBD010,11100A,CA,D0B,C10101B,D0145Equivalence Algorithm (2)Equivalence Algorithm (2)FThe product DFAs language is empty iff L=M.FWe already have an algo
29、rithm to test whether the language of a DFA is empty.46Decision PropertyDecision Property:Containment:Containment FGiven regular languages L and M,is L M?FAlgorithm also uses the product automaton.FHow do you define the final states q,r of the product so its language is empty iff L M?Answer:q is fin
30、al;r is not.47ExampleExample:Containment:ContainmentACBD010,11100A,CA,D0B,C10101B,D01Note:the only final stateis unreachable,socontainment holds.48The Minimum-State DFA for a The Minimum-State DFA for a Regular LanguageRegular LanguageFIn principle,since we can test for equivalence of DFAs we can,gi
31、ven a DFA A find the DFA with the fewest states accepting L(A).FTest all smaller DFAs for equivalence with A.FBut thats a terrible algorithm.49Efficient State MinimizationEfficient State MinimizationF填表法填表法F不可区分的状态不可区分的状态F尽最大努力求出可区分状态尽最大努力求出可区分状态 r bA B CB B C50Efficient State MinimizationEfficient
32、State MinimizationF基础:如果基础:如果p是可接收状态,是可接收状态,q是不可接收状态,那是不可接收状态,那么么p,q是可区分的。是可区分的。F归纳归纳:对于:对于p,q,如果如果r=(p,a)与与s=(q,a)是可区是可区分的,那么分的,那么p,q是可区分的。是可区分的。51ExampleExample:State Minimization:State Minimizationr b1*1,3,5,7,9 2,4,6,8 1,3,5,7,9*1,3,7,92,4,6,8 52,4,6,8 1,3,5,7,92,4,6,82,4,6,8 1,3,7,952,42,4,6,8
33、1,3,5,71,3,5,72,4 52,4,6,8 1,3,5,7,9Remember this DFA?It was constructed for thechessboard NFA by the subset construction.r bA B CB D EC D FD D GE D GF D CG D G*Here it iswith moreconvenientstate names52Efficient State MinimizationX r bA B CB D EC D FD D GE D GF D CG D G*XXXXBCDEFGFEDCBA53Efficient
34、State MinimizationXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXBCDEFGFEDCBA54Efficient State MinimizationXXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXXBCDEFGFEDCBA55Efficient State MinimizationXXXXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXXBCDEFGFEDCBA56Efficient State Minimization
35、XXXXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXXXXBCDEFGFEDCBA57Efficient State MinimizationXXXXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXXXXXBCDEFGFEDCBA58Efficient State MinimizationXXXXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXXXXXXXBCDEFGFEDCBA59Efficient State MinimizationXX
36、XXXX r bA B CB D EC D FD D GE D GF D CG D G*XXXXXXXXXXXXXXBCDEFGFEDCBA60ExampleExample Concluded Concluded r bA B CB D EC D FD D GE D GF D CG D G*r bA B CB H HC H FH H GF H CG H G*Replace D and E by H.Result is the minimum-state DFA.XXXXXXXXXXXXXXXXXXXXBCDEFGFEDCBA61Eliminating Unreachable StatesEli
37、minating Unreachable StatesFUnfortunately,combining indistinguishable states could leave us with unreachable states in the“minimum-state”DFA.FThus,before or after,remove states that are not reachable from the start state.62Closure Under UnionClosure Under UnionFIf L and M are regular languages,so is
38、 L M.FProof:Let L and M be the languages of regular expressions R and S,respectively.FThen R+S is a regular expression whose language is L M.63Closure Under Concatenation and Closure Under Concatenation and KleeneKleene Closure ClosureFSame idea:RS is a regular expression whose language is LM.R*is a
39、 regular expression whose language is L*.64Closure Under IntersectionClosure Under IntersectionFIf L and M are regular languages,then so is L M.FProof:Let A and B be DFAs whose languages are L and M,respectively.FConstruct C,the product automaton of A and B.FMake the final states of C be the pairs c
40、onsisting of final states of both A and B.65ExampleExample:Product DFA for:Product DFA for IntersectionIntersectionACBD010,11100A,CA,D0B,C10101B,D0166Closure Under DifferenceClosure Under DifferenceFIf L and M are regular languages,then so is L M =strings in L but not M.FProof:Let A and B be DFAs wh
41、ose languages are L and M,respectively.FConstruct C,the product automaton of A and B.FMake the final states of C be the pairs where A-state is final but B-state is not.67ExampleExample:Product DFA for Difference:Product DFA for DifferenceACBD010,11100A,CA,D0B,C10101B,D0168Closure Under Complementati
42、onClosure Under ComplementationFThe complement of a language L(with respect to an alphabet such that *contains L)is *L.FSince *is surely regular,the complement of a regular language is always regular.69Closure Under Reversal (2)Closure Under Reversal (2)FGiven language L,LR is the set of strings who
43、se reversal is in L.FExample:L=0,01,100;LR=0,10,001.FProof:Let E be a regular expression for L.FWe show how to reverse E,to provide a regular expression ER for LR.70Reversal of a Regular ExpressionReversal of a Regular ExpressionFBasis:If E is a symbol a,or,then ER=E.FInduction:If E isF+G,then ER=FR
44、+GR.FG,then ER=GRFR F*,then ER=(FR)*.71ExampleExample:Reversal of a RE:Reversal of a REFLet E=01*+10*.FER=(01*+10*)R=(01*)R+(10*)RF=(1*)R0R+(0*)R1RF=(1R)*0+(0R)*1F=1*0+0*1.72HomomorphismsHomomorphismsFA homomorphism on an alphabet is a function that gives a string for each symbol in that alphabet.FE
45、xample:h(0)=ab;h(1)=.FExtend to strings by h(a1an)=h(a1)h(an).FExample:h(01010)=ababab.73Closure Under HomomorphismClosure Under HomomorphismFIf L is a regular language,and h is a homomorphism on its alphabet,then h(L)=h(w)|w is in L is also a regular language.FProof:Let E be a regular expression fo
46、r L.FApply h to each symbol in E.FLanguage of resulting RE is h(L).74ExampleExample:Closure under:Closure under HomomorphismHomomorphismFLet h(0)=ab;h(1)=.FLet L be the language of regular expression 01*+10*.FThen h(L)is the language of regular expression ab*+(ab)*.75ExampleExample Continued Continu
47、edFab*+(ab)*can be simplified.F*=,so ab*=ab.F is the identity under concatenation.That is,E=E=E for any RE E.FThus,ab*+(ab)*=ab +(ab)*=ab+(ab)*.FFinally,L(ab)is contained in L(ab)*),so a RE for h(L)is(ab)*.76正则语言小结正则语言小结正则语言小结正则语言小结田聪田聪77正则语言正则语言正则语言正则语言F确定有限状态自动机确定有限状态自动机F非确定有限状态自动机非确定有限状态自动机F带带 的非确定有限状态自动机的非确定有限状态自动机F正则表达式正则表达式REDFANFA-NFAL(RE)=L(-NFA)=L(NFA)=L(DFA)=正则语言78正则语言的性质正则语言的性质正则语言的性质正则语言的性质F泵引理(必要非充分条件)泵引理(必要非充分条件)可用来证明一个特定的语言不是正则语言不能用来证明一个特定的语言是正则语言F判定性判定性一个自动机接收的语言是否为空串w是否可被某自动机接收两个自动机是否等价79正则语言的性质正则语言的性质正则语言的性质正则语言的性质封闭性封闭性正则语言的并操作交补差反转闭包连接同态80
限制150内