语料的标注与句法结构的提取精品文稿.ppt
《语料的标注与句法结构的提取精品文稿.ppt》由会员分享,可在线阅读,更多相关《语料的标注与句法结构的提取精品文稿.ppt(29页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、语料的标注与句法结构的提取第1页,本讲稿共29页Part I 语料的标注语料的标注Part II 句法结构提取句法结构提取第2页,本讲稿共29页Part I 语料的标注语料的标注1.What is annotation?2.How to do it?第3页,本讲稿共29页Annotation of corporaAnnotation:The process of making explicit linguistic categories implicit within a corpus text,for example,by adding layers of information on th
2、e grammatical classes of words,or on the classes of speech acts which have taken place in the course of the transcribed speech,or the classes of errors learners made in writing.(Edwards 1995:20).第4页,本讲稿共29页A.Part-of-speech tagging B.Syntactic annotation C.Semantic annotation D.Discourse annotation E
3、.Pragmatic annotation 第5页,本讲稿共29页POS-Tagging-also known as grammatical tagging-divides words into categories,based on how they can be combined to form sentences-most common used form of corpus annotation第6页,本讲稿共29页Nowadays ,it is fashionable to speak of a generation gap .The parents complain that ch
4、ildren are self-centered and do not show them proper respect and obedience ,while children are complaining that parents do not understand them .How does the generation gap form?第7页,本讲稿共29页How to do it?manuallycomputer-assisted fully automatic 第8页,本讲稿共29页computer-assisted annotationAnnotool第9页,本讲稿共29
5、页Fully automatic annotationCLAWSConstituent Likelihood Automatic Word-tagging Systemdeveloped by UCREL(University Centre for Computer Corpus Research on Language)at LancasterPOS-tagger for Englishexists since early 1980shas several tagsets第10页,本讲稿共29页Tagset variationCategoryExampleCLAWS5AdverboftenA
6、V0Adverb,negativenotXX0Adverb,comparativefasterAV0Adverb,superlativefastestAV0Adverb,particleupAVPAdverb,deictichereAV0Adverb,intensifierveryAV0Adv,intensifier,postposedenoughAV0Adverb,questionwhenAVQAdv,question,intensifierhowAVQ第11页,本讲稿共29页Fully automatic annotationGo tagger第12页,本讲稿共29页When_WRB we
7、_PRP are_VBP born_VBN,_,the_DT education_NN our_PRP$parents_NNS give_VBP us_PRP is_VBZ to_TO learn_VB how_WRB to_TO speak_VB and_CC how_WRB to_TO recognize_VB them_PRP._.It_PRP is_VBZ a_DT basic_JJ education_NN and_CC we_PRP start_VBP to_TO face_VB the_DT colorful_JJ world_NN._.The_DT education_NN i
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 语料 标注 句法 结构 提取 精品 文稿
限制150内