(8.2.1)--8.2BasicsofNLP.pdf
《(8.2.1)--8.2BasicsofNLP.pdf》由会员分享,可在线阅读,更多相关《(8.2.1)--8.2BasicsofNLP.pdf(12页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、Bag of WordsA model that allows us to count all words in apiece of textCreating an occurrence matrix for the sentenceor documentBag of WordsSentences:1.Jim and Pam traveled by bus.2.The train was late.3.The flight was full.Traveling by flight isexpensive.ExampleBasic structure for a bag of wordsWord
2、s with frequenciesCombination of words Bag of wordsTF-IDFTF:Term Frequency.If a particular word appears multiple times in adocument,then it might have higher importance than the otherwords that appear fewer timesIDF:Inverse Document Frequency.If a particular word appearsmany times in a document,but
3、it is also present many times insome other documents,then maybe that word is frequent,so wecannot assign much importance to itTF-IDFSentences:1.This is the first document.2.This document is the second document.ExampleResulting Multiplication of TF-IDFTF-IDF using a logTokenizationTokenization is the
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 8.2 BasicsofNLP
限制150内