书签分享收藏举报版权申诉 / 49

立即下载

当前位置：首页 > 生活休闲 > 生活常识 > BASIC TECHNIQUES IN STATISTICAL NLP.ppt

BASIC TECHNIQUES IN STATISTICAL NLP.ppt

上传人：s****8

文档编号：69578797

上传时间：2023-01-07

格式：PPT

页数：49

大小：863.50KB

( 4.5 )

《BASIC TECHNIQUES IN STATISTICAL NLP.ppt》由会员分享，可在线阅读，更多相关《BASIC TECHNIQUES IN STATISTICAL NLP.ppt（49页珍藏版）》请在淘文阁 - 分享文档赚钱的网站上搜索。

1、BASIC TECHNIQUES IN STATISTICAL NLPWord predictionn-gramssmoothingSeptember 20031Statistical Methods in NLElTwo characteristics of NL make it desirable to endow programs with the ability to LEARN from examples of past use:VARIETY(no programmer can really take into account all possibilities)AMBIGUITY

2、(need to have ways of choosing between alternatives)lIn a number of NLE applications,statistical methods are very commonlThe simplest application:WORD PREDICTIONSeptember 20032We are good at word predictionStocks plunged this morning,despite a cut in interestStocks plunged this morning,despite a cut

3、 in interestrates by the Federal Reserve,as WallStocks plunged this morning,despite a cut in interestrates by the Federal Reserve,as WallStreet began.September 20033Real Spelling ErrorsThey are leaving in about fifteen minuets to go to her houseThe study was conducted mainly be John Black.The design

4、 an construction of the system will take more than one year.Hopefully,all with continue smoothly in my absence.Can they lave him my messages?I need to notified the bank of this problem.He is trying to fine out.September 20034Handwriting recognitionlFrom Woody Allens Take the Money and Run(1969)Allen

5、(a bank robber),walks up to the teller and hands her a note that reads.I have a gun.Give me all your cash.The teller,however,is puzzled,because he reads I have a gub.No,its gun,Allen says.Looks like gub to me,the teller says,then asks another teller to help him read the note,then another,and finally

6、 everyone is arguing over what the note means.September 20035Applications of word predictionlSpelling checkerslMobile phone textinglSpeech recognitionlHandwriting recognitionlDisabled usersSeptember 20036Statistics and word predictionlThe basic idea underlying the statistical approach to word predic

7、tion is to use the probabilities of SEQUENCES OF WORDS to choose the most likely next word/correction of spelling errorlI.e.,to compute lFor all words w,and predict as next word the one for which this(conditional)probability is highest.P(w|W1.WN-1)September 20037Using corpora to estimate probabiliti

8、eslBut where do we get these probabilities?Idea:estimate them by RELATIVE FREQUENCY.lThe simplest method:Maximum Likelihood Estimate(MLE).Count the number of words in a corpus,then count how many times a given sequence is encountered.lMaximum because doesnt waste any probability on events not in the

9、 corpusSeptember 20038Maximum Likelihood Estimation for conditional probabilitieslIn order to estimate P(w|W1 WN),we can use instead:lCfr.:P(A|B)=P(A&B)/P(B)September 20039Aside:counting words in corporalKeep in mind that its not always so obvious what a word is(cfr.yesterday)lIn text:He stepped out

10、 into the hall,was delighted to encounter a brother.(From the Brown corpus.)lIn speech:I do uh main-mainly business data processinglLEMMAS:cats vs catlTYPES vs.TOKENSSeptember 200310The problem:sparse datalIn principle,we would like the n of our models to be fairly large,to model long distance depen

11、dencies such as:Sue SWALLOWED the large green lHowever,in practice,most events of encountering sequences of words of length greater than 3 hardly ever occur in our corpora!(See below)l(Part of the)Solution:we APPROXIMATE the probability of a word given all previous wordsSeptember 200311The Markov As

12、sumptionlThe probability of being in a certain state only depends on the previous state:P(Xn=Sk|X1 Xn-1)=P(Xn=Sk|Xn-1)This is equivalent to the assumption that the next state only depends on the previous m inputs,for m finite (N-gram models/Markov models can be seen as probabilistic finite state aut

13、omata)September 200312The Markov assumption for language:n-grams modelslMaking the Markov assumption for word prediction means assuming that the probability of a word only depends on the previous n words(N-GRAM model)September 200313Bigrams and trigramslTypical values of n are 2 or 3(BIGRAM or TRIGR

15、nces of words longer than 2 or 3?We use the CHAIN RULE:lE.g.,P(the big dog)=P(the)P(big|the)P(dog|the big)lThen we use the Markov assumption to reduce this to manageable proportions:September 200315Example:the Berkeley Restaurant Project(BERP)corpuslBERP is a speech-based restaurant consultantlThe c

16、orpus contains user queries;examples includeIm looking for Cantonese foodId like to eat dinner someplace nearbyTell me about Chez PanisseIm looking for a good place to eat breakfastSeptember 200316Computing the probability of a sentencelGiven a corpus like BERP,we can compute the probability of a se

17、ntence like“I want to eat Chinese food”lMaking the bigram assumption and using the chain rule,the probability can be approximated as follows:P(I want to eat Chinese food)P(I|”sentence start”)P(want|I)P(to|want)P(eat|to)P(Chinese|eat)P(food|Chinese)September 200317Bigram countsSeptember 200318How the

18、 bigram probabilities are computedlExample of P(I,I):C(“I”,”I”):8C(“I”):8+1087+13.=3437P(“I”|”I”)=8/3437=.0023September 200319Bigram probabilitiesSeptember 200320The probability of the example sentencelP(I want to eat Chinese food)lP(I|”sentence start”)*P(want|I)*P(to|want)*P(eat|to)*P(Chinese|eat)*

19、P(food|Chinese)=l.25*.32*.65*.26*.002*.60=.000016September 200321Examples of actual bigram probabilities computed using BERPSeptember 200322Visualizing an n-gram based language model:the Shannon/Miller/Selfridge methodlFor unigrams:Choose a random value r between 0 and 1Print out w such that P(w)=rl

20、For bigrams:Choose a random bigram P(w|)Then pick up bigrams to follow as beforeSeptember 200323The Shannon/Miller/Selfridge method trained on ShakespeareSeptember 200324Approximating Shakespeare,contdSeptember 200325A more formal evaluation mechanismlEntropylCross-entropySeptember 200326The downsid

21、elThe entire Shakespeare oeuvre consists of 884,647 tokens(N)29,066 types(V)300,000 bigramslAll of Jane Austens novels(on Manning and Schuetzes website):N=617,091 tokensV=14,585 typesSeptember 200327Comparing Austen n-grams:unigramsIn personshewasinferiorto1-gramP(.)P(.)P(.)P(.)1the.034the.034the.03

22、4the.0342to.032to.032to.032to.0323and.030and.030and.0308was.015was.01513she.0111701inferior.00005September 200328Comparing Austen n-grams:bigramsIn personshewasinferiorto2-gramP(.|person)P(.|she)P(.|was)P(.inferior)1and.099had.0141not.065to.2122who.099was.122a.05223she.009inferior0September 200329Co

23、mparing Austen n-grams:trigramsIn personshewasinferiorto3-gramP(.|In,person)P(.|person,she)P(.|she,was)P(.was,inferior)1UNSEENdid.05not.057UNSEEN2was.05very.038inferior0September 200330Maybe with a larger corpus?lWords such as ergativity unlikely to be found outside a corpus of linguistic articleslM

24、ore in general:Zipfs lawSeptember 200331Zipfs law for the Brown corpusSeptember 200332Addressing the zeroeslSMOOTHING is re-evaluating some of the zero-probability and low-probability n-grams,assigning them non-zero probabilitiesAdd-oneWitten-BellGood-TuringlBACK-OFF is using the probabilities of lo

25、wer order n-grams when higher order ones are not availableBackoffLinear interpolationSeptember 200333Add-one(Laplaces Law)September 200334Effect on BERP bigram countsSeptember 200335Add-one bigram probabilitiesSeptember 200336The problemSeptember 200337The problemlAdd-one has a huge effect on probab

26、ilities:e.g.,P(to|want)went from.65 to.28!lToo much probability gets removed from n-grams actually encountered(more precisely:the discount factor September 200338Witten-Bell DiscountinglHow can we get a better estimate of the probabilities of things we havent seen?lThe Witten-Bell algorithm is based

27、 on the idea that a zero-frequency N-gram is just an event that hasnt happened yetlHow often these events happen?We model this by the probability of seeing an N-gram for the first time(we just count the number of times we first encountered a type)September 200339Witten-Bell:the equationslTotal proba

28、bility mass assigned to zero-frequency N-grams:(NB:T is OBSERVED types,not V)lSo each zero N-gram gets the probability:September 200340Witten-Bell:why discountinglNow of course we have to take away something(discount)from the probability of the events seen more than once:September 200341Witten-Bell

29、for bigramslWe relativize the types to the previous word:September 200342Add-one vs.Witten-Bell discounts for unigrams in the BERP corpusWordAdd-OneWitten-Bell“I”.68.97“want”.42.94“to”.69.96“eat”.37.88“Chinese”.12.91“food”.48.94“lunch”.22.91September 200343One last discounting method.lThe best-known

30、 discounting method is GOOD-TURING(Good,1953)lBasic insight:re-estimate the probability of N-grams with zero counts by looking at the number of bigrams that occurred oncelFor example,the revised count for bigrams that never occurred is estimated by dividing N1,the number of bigrams that occurred onc

31、e,by N0,the number of bigrams that never occurredSeptember 200344Combining estimatorslA method often used(generally in combination with discounting methods)is to use lower-order estimates to help with higher-order oneslBackoff(Katz,1987)lLinear interpolation(Jelinek and Mercer,1980)September 200345B

32、ackoff:the basic ideaSeptember 200346Backoff with discountingSeptember 200347ReadingslJurafsky and Martin,chapter 6lThe Statistics GlossarylWord prediction:For mobile phonesFor disabled userslFurther reading:Manning and Schuetze,chapters 6(Good-Turing)September 200348AcknowledgmentslSome of the material in these slides was taken from lecture notes by Diane Litman&James MartinSeptember 200349

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

16 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: BASIC TECHNIQUES IN STATISTICAL NLP

淘文阁 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：BASIC TECHNIQUES IN STATISTICAL NLP.ppt
链接地址：https://www.taowenge.com/p-69578797.html