编译原理课程设计(共31页).docx
精选优质文档-倾情为你奉上编译原理课程设计 实验名称:C-语言词法分析器的手工构造 C-语言词法分析器的lex生成 C-语言语法分析器的手工构造 学生姓名: 刘 恺 丽 学生学号: 指导教师姓名: 于中华 实验一:C-语言词法分析器的手工构造一、 实验目的及意义1. 理解C-语言的词法特点,并能构造各种token的正则表达式;2. 掌握将正则表达式转换为DFA的方法;3. 学会设计C-语言手动生成词法分析器的数据类型和数据结构。二、 实验环境1. 操作系统:Window XP/Windows 7;2. 开发环境:Microsoft Visual C+ 6.0。三、 算法分析与设计 1.C-语言的词法规则(1)关键字else if int return void while(2)特殊符号+ - * / < <= > >= = != = ; , ( ) /* */(3)其它token(区分大小写)ID = letter letter*NUM = digit digit*letter = a|z|A|.|ZDigit = 0|9 (4)空白符号 空白 n t (5)注释 由标记符号/*/标记起来的部分。 2.C-语言的词法正则表达式digit 0-9number digit+letter a-zA-Zidentifier letter+newline nwhitespace " "t+3.C-语言的DFA4.重要数据类型设计(1)token类型用枚举量分为以下几个typedef enumENDFILE,ERROR,ELSE,IF,INT,RETURN,VOID,WHILE,ID,NUM,PLUS,MINUS,TIMES,OVER,LT,LTE,RT,RTE,EQ,NE,ASSIGN,SEMI,COMMA,LPAREN,RPAREN,LZ,RZ,LD,RD,LC,RCTokenType;(2)DFA9个状态typedef enumSTART,INNE,INEQ,INLT,INRT,INID,INNUM,INOVER,INCOMMENT1,INCOMMENT2,DONEStateType;四、代码实现1. 查找保留字函数TokenTypeS FindResvd (char * s)须注意:对Keyword先将keyWord归为ID类。getToken时再匹配。 保留字数据结构如下:static struct char* str;TokenType tok;reservedWordsMAXRESERVED="else",ELSE,"if",IF,"int",INT,"return",RETURN,"void",VOID,"while",WHILE;static TokenType reservedLookup (char * s)int i;for (i=0;i<MAXRESERVED;i+)if (!strcmp(s,reservedWordsi.str)return reservedWordsi.tok;return ID;2. getToken代码设计说明:1) 以状态表为驱动,根据当前的状态以及当前字符来决定下一个状态。2) 嵌套case:通过一个状态标记来标记当前DFA所处的状态,通过case分支语句来执行每一个转移的情况,直观易懂,但是随着DFA状态数量的增多,循环嵌套的程序代码就会变得很冗长。3) 实现DFA的核心代码及分析: 扫描程序的主函数为getToken,用的是嵌套case的方法,通过自定义一个状态枚举类型来标记当前所处的DFA状态,并用字符数组tokenString来保存已经扫描过的字符,每次调用getToken函数的时候,当前状态都会被初始化为START状态,用while循环当当前状态不为DONE状态,就从输入流获取一个字符,并根据上面给出的DFA判断如何进行状态转移;当当前状态为接收状态时,tokenString里面就是一个完整的token,于是将tokenString的内容打印到listing文件并返回tokenString里面的内容。getToken函数调用getNextChar函数获取源文件中下一个字符,如果某些状态需要回溯一个字符,则调用ungetNextChar函数将已经获取的字符释放出来等待下一个状态中重新获取。GetToken函数中识别token关键代码:TokenType getToken(void)int tokenStringIndex = 0;TokenType currentToken;StateType state = START;int save;while (state !=DONE)int c = getNextChar();save = true;switch (state)case START:if(isdigit(c)state = INNUM;else if (isalpha(c)state = INID;else if (c = '!')state = INNE;else if (c = '=')state = INEQ;else if (c = '>')state = INRT;else if (c = '<')state = INLT;else if (c = '/')state = INOVER;else if (c =' ')|(c = 't')|(c ='n')save=false;/state = START;elsestate = DONE;switch(c)case EOF:save = false;currentToken = ENDFILE;break;case '+':currentToken = PLUS;break;case '-':currentToken = MINUS;break;case '*':currentToken = TIMES;break;case '':currentToken = SEMI;break;case ',':currentToken = COMMA;break;case '(':currentToken = LPAREN;break;case ')':currentToken = RPAREN;break;case '':currentToken = LZ;break;case '':currentToken = RZ;break;case '':currentToken = LD;break;case '':currentToken = RD;break;default:currentToken = ERROR;break;break;case INNE:state = DONE;if(c='=')currentToken=NE;elseungetNextChar();save=false;currentToken=ERROR;break;case INEQ:state = DONE;if(c='=')currentToken=EQ;elseungetNextChar();currentToken=ASSIGN;break;case INRT:state = DONE;if(c='=')currentToken=RTE;elseungetNextChar();currentToken=RT;break;case INLT:state = DONE;if(c='=')currentToken=LTE;elseungetNextChar();currentToken=LT;break;case INNUM:if(!isdigit(c)ungetNextChar();save = false;state = DONE;currentToken =NUM;break;case INID:if(!isalpha(c)ungetNextChar();save = false;state = DONE;currentToken =ID;break;case INOVER:if(c = '*')save=false;state = INCOMMENT1;tokenStringIndex=0;/currentToken =LC;elseungetNextChar();save = false;state = DONE;currentToken =OVER;break;case INCOMMENT1:if(c = '*')save=false;state = INCOMMENT2;else if(c=EOF)save=false;state=DONE;currentToken=ERROR; else save = false;state = INCOMMENT1;/currentToken =ENDFILE;break;case INCOMMENT2:save = false;if(c = '/')state = START;else if(c=EOF)state=DONE;currentToken=ERROR;else if(c = '*')state =INCOMMENT2;elsestate = INCOMMENT1;break;case DONE:default:fprintf(listing,"Scanner Bug: state= %dn",state); state = DONE;currentToken = ERROR;break;五、 测试结果测试小程序: 输出结果:程序运行结果分析:如上图所示:通过观察分析,发现程序已经能够成功识别关键字,如 第一行的“1”,能成功识别出NUM,如 第六行的“int”,成功识别出reserved word等。 另外,第三行为注释部分,已经被识别出并跳过。同时,第四行为空,跳过。 通过以上实例运行结果可以得知,源文件中的各种token无论正确与否,都能够被本扫描程序准确的扫描出来。当能够获取正确的token时,就输出token;当遇到不能识别的token时,就输出一个ERROR:token;当遇到注释的时候,就说明是注释,并绕过。程序运行成功,词法分析正确。六、小结通过g本次实验,我学会了根据目标语言的词法规则构造正则表达式,并通过正则表达式构造出对应的DFA,并用代码实现DFA,能够用DFA来对C-语言进行词法分析。更深入的理解了DFA的精髓,了解了用代码实现DFA的各种方法以及它们各自的特点。同时,也深入理解并实践了编译原理词法分析的相关理论。实验二:C-语言词法分析器的lex生成一、 实验目的及意义1. 实现基于Lex的C-语言的词法分析器2. 学会安装Parser Generator 2并且熟悉其环境3. 熟练配置Microsoft Visual C+ 6.0,并实现与Parser Generator 2 相应库的连接4. 学习基于Lex构造词法分析器的方法,以及实验过程5. 熟悉C-语言的各种Token6. 构造各种Token的正则表达式7. 设计词法分析器所需要的各种数据结构及Token识别成功后的 动作8. 用Parser Generator 2实现所设计的C-词法分析器二、 实验环境1. Windows XP操作系统2. 编译环境Parser Generator3. VC6.0开发环境三、 分析与设计 1.C-语言的词法规则(1)关键字else if int return void while(2)特殊符号+ - * / < <= > >= = != = ; , ( ) /* */(3)其它token(区分大小写)ID = letter letter*NUM = digit digit*letter = a|z|A|.|ZDigit = 0|9 (4)空白符号 空白 n t (5)注释 由标记符号/*/标记起来的部分。 2.C-语言的词法正则表达式设计keyWord, SpecialChar, ID, ErrorID, Num, Annotation, Whitespace ,Enter, End, Other相关正则表达式如下所示:KeyWord else|if|int|return|void|whileSpecialChar "+"|"-"|"*"|"/"|"<"|"<="|">"|">="|"="|"!="|"="|""|","|"("|")"|""|""|""|""letter A-Za-zdigit 0-9ID letterletter*ErrorID digit+letter+(digit|letter)*|letter+digit+(digit|letter)*Num digit+Annotation "/*"Whitespace t+Enter n+ End <<EOF>>Other .3.Erro的处理四、 测试结果输出测试小程序输出结果:实验结果分析: 如上图黄色部分所示,程序已经能够成功识别keyWord,ID,NUM,SpecialCh等常规token类型,同时能识别出ErrorID。 并能处理嵌套注释 和 文件末注释。程序分析正确,运行成功。五、 小结通过lex词法分析,学会了pargen的环境配置与使用。并对C-的正则表达式和lex的运行原理有一定的了解。实验四:C-语言语法分析器的手工构造一、实验目的及意义1.熟悉C-语言的语法规则;2.使用上下文无关文法描述C-语言的语法规则 ;3.设计语法分析器所需要的各种数据结构 ; 4.基于递归下降法实现C-语言的语法分析器,能够输入源程序,输出整个程序的语法树。二、实验环境1.Windows XP2.Microsoft Visual C+ 6.0三、算法分析与设计(1)C-的BNF语法描述如下: 1.program -> declaration-list2.declaration-list -> declaration-list declaration | declaration3.declaration -> var-declaration | func-declaration4.var-declaration -> type-specifier ID; |type-specifier ID NUM ;5.type-specifier -> int | void6.fun-declaration -> type-specifier ID ( params ) compound-stmt7.params -> param-list | void8.param-list -> param-list , param | param9.param -> type-specifier ID | type-specifier ID pound-stmt -> local-declarations statement-list | empty11.local-declarations - > local-declarations var-declaration | empty12.statement-list - > statement->list statement | empty13.statement - > expression-stmt | compound-stmt | selection-stmt | iteration-stmt | return-stmt14.expression-stmt -> expression ; | ; 15.selection-stmt -> if ( expression ) statement | if ( expression ) statement else statement16.iteration-stmt -> while ( expression ) statement 17.return-stmt -> return ; | return expression ; 18.expression -> var = expression | simple-expression19.var -> ID | ID expression 20.simple-expression -> additvie-expression relop addtive-expression | additive-expression21.relop -> <= | < | > | >= | = | != 22.additive-expression -> additive-expression addop term | term23.addop -> + | - 24.term ->term mulop factor | factor25.mulop -> * | / 26.factor -> ( expression ) | var | call | NUM 27.call -> ID ( args ) 28.args -> arg-list | empty29.arglist -> arglist , expression | expression(2)节点类型:共有三种:表达式,语句,声明,参数四种 typedef enumStmtK,ExpK,DeclarK, ParamK NodeKind; 语句类型:分为if,while,return三种 typedef enumIfK,WhileK,ReturnK StmtKind; 表达式类型:Call类型,操作符类型,常数类型及变量类型,其中变量又 分数组和简单变量 typedef enumCallK,OpK,ConstK,IdK,ArryK ExpKind; 声明类型:分函数声明和变量声明,其中变量又分普通变量和数组变量 typedef enumFunDeclarK,ArryDeclarK,VarDeclarK DeclarKind; 参数类型:数组参数,简单参数,函数参数 typedef enumArryParamK,IdParamK,VoidK ParamKind; typedef enumVoid,Int TypeKind;(3)节点的声明:typedef struct treeNodestruct treeNode*childMAXCHILDREN;struct treeNode*sibling;int lineno;NodeKind nodekind;union StmtKind stmt;ExpKind exp;DeclarKind declar;ParamKind param;kind;struct TokenType op;int val;char *name;int index;attr;TypeKind type;TreeNode;四、代码实现· 节点类型代码typedef enumStmtK,ExpK,DeclarK,ParamK NodeKind;typedef enumIfK,WhileK,ReturnK StmtKind;typedef enumCallK,OpK,ConstK,IdK,ArryK ExpKind;typedef enumFunDeclarK,ArryDeclarK,VarDeclarK DeclarKind;typedef enumArryParamK,IdParamK,VoidK ParamKind;typedef enumVoid,Int TypeKind;typedef struct treeNodestruct treeNode*childMAXCHILDREN;struct treeNode*sibling;int lineno;NodeKind nodekind;union StmtKind stmt;ExpKind exp;DeclarKind declar;ParamKind param;kind;struct TokenType op;int val;char *name;int index;attr;TypeKind type;· 新建节点TreeNode *newDeclarNode(DeclarKind kind)TreeNode*t=(TreeNode *)malloc(sizeof(TreeNode);int i;if(t=NULL)printf("Out of memory error at line %dn",lineno);elsefor(i=0;i<MAXCHILDREN;i+)t->childi=NULL;t->sibling=NULL;t->nodekind=DeclarK;t->kind.declar=kind;t->lineno=lineno;t->type=Void;t->attr.index=0;return t;· 最重要的parse.cpp部分的关键代码static TokenType token;static TreeNode *declaration_list();static TreeNode *declaration();static TreeNode *params();static TreeNode *param_list();static TreeNode *param();static TreeNode *compound_stmt();static TreeNode *local_declarations();static TreeNode *statement_list();static TreeNode *statement();static TreeNode *expression_stmt();static TreeNode *selection_stmt();static TreeNode *iteration_stmt();static TreeNode *return_stmt();static TreeNode *expression();static TreeNode *simple_expression(TreeNode* p);static TreeNode *additive_expression(TreeNode* p);static TreeNode *term(TreeNode* p);static TreeNode *factor();static TreeNode *args();static TreeNode *arg_list();static void syntaxError(char* message)printf(">>> Syntax error at line %d: %s",lineno,message);static void match(TokenType expected)if(token=expected) token=getToken();elsesyntaxError("unexpected token-> ");printToken(token,tokenString);TreeNode *declaration_list()TreeNode *t=declaration();TreeNode *p=t;while(token!=ENDFILE)TreeNode *q;q=declaration();if(q!=NULL)if(t=NULL)t=p=q;elsep->sibling=q;p=q;return t;TreeNode *declaration()TreeNode*t=NULL;char*s=NULL;if(token=INT)match(INT);char*s=copyString(tokenString);match(ID);if(token=LZ)match(LZ);t=newDeclarNode(ArryDeclarK);t->type=Int;t->attr.name=copyString(s);t->attr.index=atoi(tokenString);match(NUM);match(RZ);match(SEMI);else if(token=LPAREN)t=newDeclarNode(FunDeclarK);t->attr.name=copyString(s);match(LPAREN);t->child0=params();match(RPAREN); t->child1=compound_stmt();elset=newDeclarNode(VarDeclarK);t->type=Int;t->attr.name=copyString(s);match(SEMI);elset=newDeclarNode(FunDeclarK);match(VOID);t->type=Void;t->attr.name=copyString(tokenString);match(ID);match(LPAREN);t->child0=params();match(RPAREN);t->child1=compound_stmt();return t;TreeNode *params()TreeNode *t=NULL;if(token!=VOID)t=param_list();elset=newParamNode(VoidK);match(VOID);return t;TreeNode *param_list()TreeNode*t=NULL;t=param();TreeNode*p=t;while(token!=RPAREN) match(COMMA);/token=getToken();TreeNode*q=param();p->sibling=q;p=q;return t;TreeNode *param()TreeNode*t=NULL;match(INT);char*s=copyString(tokenString);match(ID);if(token=LZ)match(LZ);t=newParamNode(ArryParamK);t->attr.name=copyString(s);t->attr.index=atoi(tokenString);match(token);match(RZ);elset=newParamNode(IdParamK);t->attr.name=copyString(s);return t;TreeNode *compound_stmt()match(LD);TreeNode*t=NULL;TreeNode*p=NULL;p=t=local_declarations();if(p!=NULL)while(p->sibling!=NULL) p=p->sibling;p->sibling=statement_list();elset=statement_list();match(RD);return t;TreeNode *local_declarations()TreeNode*t=NULL;TreeNode*p=NULL;TreeNode*q=NULL;while(token=INT)match(INT);char*s=copyString(tokenString);match(ID);if(token=LZ)match(LZ);q=newDeclarNode(ArryDeclarK);q->attr.index=atoi(tokenString);q->type=Int;q->attr.name=copyString(s);match(NUM);match(RZ);match(SEMI);elseq=newDeclarNode(VarDeclarK);q->type=Int;q->attr.name=copyString(s);match(SEMI);if(q!=NULL)if(t=NULL) t=p=q;elsep->sibling=q;p=q;return t;TreeNode *statement_list()TreeNode*t=NULL;if(token!=RD)t=statement();TreeNode*p=t;while(token!=RD)TreeNode*q=statement();p->sibling=q;p=q;return t;TreeNode *statement()TreeNode*t=NULL;switch(token)case IF:t=selection_stmt();break;case WHILE:t=iteration_stmt();break;case RET