机器学习概论机器学习概论 (3).pdf

资源ID：57971284 资源大小：4.76MB 全文页数：49页
资源格式： PDF 下载积分：8金币

快捷下载

会员登录下载

微信登录下载

三方登录下载：

微信扫一扫登录

下载资源需要8金币

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

机器学习概论机器学习概论 (3).pdf

Welcome to Introduction to Machine Learning!2010.3.51*Images come from Internet Tea TimeDangerous RansomwareMin Zhang TypesLocker Ransomware(computer locker)Denies access to the computer or deviceLeaves the underlying system and files untouchedLess effective at extracting ransom compared with crypto ransomwareCrypto Ransomware(data locker)Prevents access to files or dataFinds and encrypts valuable data using 2048 or 4096-bit RSA keysEncrypted files are unusable unless decryption key is obtainedDangerous RansomwareCrypto Ransomware Demand ScreenKey FiguresEmerged in 2013 with CryptoLockerAndroid variant was reported in 201417%of the infections in 2015 were on Android devicesIn March 2016,an Apple Mac variant was foundRansomware programs were detected on 753,684 computers in 2015179,209 computers were targeted by encryption ransomware.“The ransomware is that good To be honest,we often advise people just to pay the ransom.”Joseph Bonavolonta,an assistant special agent with the FBI,at Bostons Cyber Security Summit 2015Paralyzed by RansomwareOn February 2016 a hospital in Los Angeles was hit by ransomware,leaving doctors unable to access critical patient data for more than a weekIt was reported that all patient-record history and hospital email archives were encrypted by the malwareCybercriminals asked for$3.6 million in Bitcoin for the decryption keyHowever,it is uncertain whether the hospital has paid the ransom or notParalyzed by RansomwareComputer system of Tewksbury Police Department,Massachusetts was infected early March with ransomwareTo keep their computer files from being destroyed Chief Timothy Sheehana authorised the$550 ransom demand in bitcoinIt is believed a communal network user accidentally downloaded the malware,which then encrypted all the computer data,holding it for ransom.Police Pay Ransomware DemandPreventionBackup your files regularlyApply software patches as soon as they become availableSome ransomware arrive via vulnerability exploits.Bookmark trusted websites and access these websites via bookmarksDownload email attachments only from trusted sourcesScan your system regularly with anti-malwarePreventionBackup your files regularlyApply software patches as soon as they become availableSome ransomware arrive via vulnerability exploits.Bookmark trusted websites and access these websites via bookmarksDownload email attachments only from trusted sourcesScan your system regularly with anti-malware10March 10,2019Information Retrieval:Introduction11Decision Tree Learning Pruning(Cont.)Introduction to Machine Learning:Decision Tree Learning12Introduction to Machine Learning:Decision Tree Learning13Two ways of avoid over-fitting for D-TreeI.Stop growing when data split not statistically significant(pre-pruning)II.Grow full tree,then post-pruningFor option II:How to select“best”tree？Measure performance over training data(statistical pruning)Confidence level(will be introduced in mid-term CLTheory I)Measure performance over separate validation data setMDL(Minimize Description Length 最小描述长度):minimize(size(tree)+size(misclassifications(tree)Review:Avoid over-fittingIntroduction to Machine Learning:Decision Tree Learning14Review:Type II.Post-pruning(1).Reduced-Error pruningSplit data into training set and validation setValidation set:Known labelTest performanceNo model updates during this test!Do until further pruning is harmful:Evaluate impact on validation set of pruningeach possible node(plus the subtree it roots)Greedily remove the one that most improves validation set accuracyHow to assign the label of the new leaf node?Introduction to Machine Learning:Decision Tree Learning15Type II.Post-pruning(2).Rule Post-pruning1,Convert tree to equivalent set of rulese.g.if(outlook=sunny)(humidity=high)then playTennis=no2,Prune each rule by removing any preconditions that result in improving its estimated accuracyi.e.(outlook=sunny),(humidity=high)3,Sort rules into desired sequence(by their estimated accuracy).4,Use the final rules in the same sequence when classifying instances.(after the rules are pruned,it may not be possible to write them back as a tree anymore.)One of the most frequently used methods,e.g.in C4.5.Why convert the decision tree to rule before pruning?16Independent to contexts.Otherwise,if the tree were pruned,two choices:Remove the node completely,or Retain it there.No difference between root node and leaf nodes.Improve readability Introduction to Machine Learning:Decision Tree LearningIntroduction to Machine Learning:Decision Tree Learning17Brief overview of Decision Tree Learning(Part 1)Introduction-basic conceptsID3 algorithm as an exampleAlgorithm descriptionFeature selectionStop conditionsInductive bias for ID3Over-fitting and PruningPre-pruningPost-pruning:Reduced-Error pruning,Rule post-pruningIn practice,pre-pruning is faster,post-pruning generally leads to more accurate treesIntroduction to Machine Learning:Decision Tree Learning18Brief overview of Decision Tree Learning(Part 1)The basic idea come from humans decision procedureSimple,easy to understand:IfThenRobust to noise dataWidely used in research and applicationMedical Diagnosis(Clinical symptoms disease)Credit analysis(personal information valuable custom?)ScheduleA decision tree is generally tested as the benchmark before more complicated algorithms are employed.Part II:Extension-Decision Tree at Real ScenariosProblems&improvementsIntroduction to Machine Learning:Decision Tree Learning19Introduction to Machine Learning:Decision Tree Learning201.Continuous attribute valueCreate a set of discrete attribute valueOptions:I.Get the medium of the adjacent values with different decisions(Fayyad proved that thresholds lead to max IG satisfies the condition in 1991)II.Take into account the probabilityTemperature 404860728090decisionNoNoYesYesYesNouslxxx2/)(ulsxxx+=ulsPxxPx+-=)1(Introduction to Machine Learning:Decision Tree Learning212.Attributes with many valuesProblem:Bias:If attribute has many values,IG will select ite.g.Date as an attributeOne possible solution:use GainRatio instead21(,)(,)(,)|(,)log|ciiiGain S AGainRatio S ASplitInformation S ASSSplitInformation S ASS=-Punish factor，entropy of S on AIntroduction to Machine Learning:Decision Tree Learning223.Unknown attribute valuesBTR Temp labelnegnormal-negnormal-negnormal-negnormal-neghigh+posnormal+poshigh+poshigh+?normal+Blood Test Results5+,4-negpos2+,4-3+,0-Most common according to the label:posWith missing data？1+,4-4+,0-Assign probability:neg5/8,pos3/8(1+5/8)+,4-(3+3/8)+,0-Most common training:negIntroduction to Machine Learning:Decision Tree Learning234.Attributes with costsTan&Schlimmer(1990)Nunez(1988)w:0,1 importance of cost2(,)()Gain S ACost A(,)21()1)Gain S AwCost A-+Introduction to Machine Learning:Decision Tree Learning24Whats more Perhaps the simplest and the most frequently used algorithmEasy to understandEasy to implementEasy to useSmall computation costsDecision Forest:Many decision trees by C4.5For More information about C4.5(C5.0):http:/ Quinlans homepage:http:/ learning hypothesisIntroduction to Machine Learning:Decision Tree Learning25Introduction to Machine Learning:Decision Tree Learning26Inductive learning hypothesisMuch of the learning involves acquiring general concept from specific training examples.Inductive learning algorithms can at best guarantee that the output hypothesis fits the target concept over the training data.Notice:over-fitting problemIntroduction to Machine Learning:Decision Tree Learning27Inductive learning hypothesisThe Inductive Learning Hypothesis:Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over unobserved examples.(任一假设若在足够大的训练样例集中很好地逼近目标函数，它也能在未见实例中很好地逼近目标函数)Topic 3.Bayesian Learningintroduction to machine learning:Bayes Learning29Background of Bayesian LearningDiscover relationship between two events(causal analysis,the precondition&the conclusion)A Be.g.pneumonia lung cancer?Hard to tell directlyReversed thinkinge.g.How many lung cancer patients have suffered from pneumonia?In our daily life,disease diagnose by a doctor can be taken as a Bayesian learning process.introduction to machine learning:Bayes Learning30Bayes TheoremP(h|D)=the posterior probability of hP(h)=the prior probability of hP(D)the prior probability of DP(D|h)=the probability of D given h)()()|()|(DPhPhDPDhP=P(h)is the prob.of:one has cancerP(D)is the prob.of:test result=+P(h|D)is the prob.of:given the test result=+,one has cancerP(D|h)is the prob.of:given one has cancer,the test result=+Thomas Bayes(17021761)Thomas Bayes(17021761)An example:Lab test result:+，has a particular form of cancer?introduction to machine learning:Bayes Learning31Bayes TheoremP(h)hypotheses:mutually exclusiveH space:totally exhaustiveP(D)D is taken as the sample of all possible dataIndependent with hCan be ignored in comparison among different hypothesesP(D|h)(likelihood)似然度log likelihood log(P(D|h)=1)(ihPintroduction to machine learning:Bayes Learning32An exampleLab test result:+，has a cancer?What we know:Correct positive:98%(has cancer,then test result+)Correct negative:97%(not cancer,then test result -)Over entire population of people,only 0.008 have cancerP(cancer|+)=?0.0080.9920.980.970.020.03P(cancer|+)=P(+|cancer)P(cancer)/P(+)=0.21introduction to machine learning:Bayes Learning33Choosing hypotheses MAPGenerally we want the most probable hypothesis given the training dataMaximum A Posteriori(MAP):(最大后验假设)hMAP)()()|()|(DPhPhDPDhP=introduction to machine learning:Bayes Learning34P(+|cancer)P(cancer)=0.0078,P(+|cancer)P(cancer)=0.0298hMAP=cancerAn example MAP0.0080.9920.980.970.020.03Lab test result:+，has a particular form of cancer?When we know:Correct positive:98%(cancer,result+)Correct negative:97%(not cancer,result-)Over entire population of people,only 0.008 have cancerintroduction to machine learning:Bayes Learning35Brief Overview Bayes theoremUse prior probability to infer posterior probabilityMax A Posterior,MAP,hMAP，极大后验假设)()()|()|(DPhPhDPDhP=introduction to machine learning:Bayes Learning36Choosing hypotheses MLIf we know nothing about hypotheses,or if we know all hypotheses have same probabilities,then MAP is Maximum Likelihood(hML极大似然假设)fromintroduction to machine learning:Bayes Learning37Maximum Likelihood&Least Square ErrorTraining data:di=f(xi)+eidi:independent samples.f(xi):noise-free value of target functionei:noise,independent random variables,normal distribution N(0,2)di:normal distribution N(f(xi),2)introduction to machine learning:Bayes Learning38Maximum Likelihood&Least Square ErrorLog：monotonicNormal distributionIndependent samplesintroduction to machine learning:Bayes Learning39independent random variables,normal distribution noise N(0,2),hML=hLSEPlease read section 6.4 of the book machine learning(p164 in En.version).constantMaximum Likelihood&Least Square Errorintroduction to machine learning:Bayes Learning40Brief Overview Bayes theoremUse prior probability to infer posterior probabilityMax A Posterior,MAP,hMAP，极大后验假设Maximum Likelihood,ML,hML,极大似然假设ML vs.LSE(Least Square Error)()()|()|(DPhPhDPDhP=Assume target function f:X V,where each instance x=(a1,a2,an).Than most probable value of f(x)is:Nave Bayes assumption:Nave Bayes classifier:introduction to machine learning:Bayes Learning41Nave Bayesian Classifier(朴素贝叶斯分类器)Independent attributesIf independent attribute condition is satisfied,then vMAP=vNBintroduction to machine learning:Bayes Learning42Example:Word Sense Disambiguation(词义消歧)e.g.fly=?bank=?For the word w,using context c to disambiguatione.g.A fly flies into the kitchen while he fry the chicken.Context c:a group of words wiaround w (-features/attributes)si:the ithsense of the word w (-output label)Nave Bayes assumption:Bayes decision:where:Bayes theoremUse prior probability to inference posterior probabilityMax A Posterior,MAP,hMAP，极大后验假设Maximum Likelihood,ML,hML,极大似然假设ML vs.LSE(Least Square Error)Nave Bayes,NB,朴素贝叶斯Independent attribute/feature assumptionNB vs.MAPintroduction to machine learning:Bayes Learning43Brief Overview)()()|()|(DPhPhDPDhP=introduction to machine learning:Bayes Learning44MDL(Minimum Description Length)Occams razor:prefer the shortest hypothesisMDL：prefer the hypothesis h that minimizes：where LC(x)is the description length of x under encoding Cintroduction to machine learning:Bayes Learning45Explanation to MDL（information theory）Code design for randomly send messagesthe probability to message i is piWhats the optimal(shortest expected coding length)code?Assign shorter codes to messages that are more probableThe optimal code for message i is-log2 p bits Shannon&Weaver 1949Example:BABABABADABACAABAACABDAAAAABAAAAAAAADBCABinary coding:A00,B01,C10,D11,then the code:01000100010001001100010010000001000010000111000000000001000000000000000011011000,A shorter code:Let A0,B10,C110,D111,then the code becomes 1001001001001110100110001000110010111000introduction to machine learning:Bayes Learning46MDL and MAPLC2(D|h)LC1(h)-log2 p(h):length of h under optimal code C-log2 p(D|h):length of D given h under optimal code Cintroduction to machine learning:Bayes Learning47length of h,and the cost of encoding data given hSuppose the sequence of instances is already known to both transmitter and receiver.No misclassification:no need to transmit any additional information given hIf some are misclassified by h,then it should transmit:1.which example is wrong?-at most log2m(m:#of instances)2.the correct classification?-at most log2k(k:#of classes)Another Explanation to MDLintroduction to machine learning:Bayes Learning48Tradeoff:complexity of hypothesis vs.the number of errors committed by the hypothesisPrefer a shorter hypothesis that makes a few errorsNot a longer hypothesis that perfectly classifies the training datadealing with overfitting problemExplanation to MDLBayes theoremUse prior probability to inference posterior probabilityMax A Posterior,MAP,hMAP(极大后验假设)Maximum Likelihood,ML,hML(极大似然假设)ML vs.LSE(Least Square Error)Nave Bayes,NB,朴素贝叶斯Independent assumptionNB vs.MAPMaximum description length,MDL(最小描述长度)Tradeoff:hypothesis complexity vs.errors by hMDL vs.MAPintroduction to machine learning:Bayes Learning49Overview)()()|()|(DPhPhDPDhP=

注意事项

本文（机器学习概论机器学习概论 (3).pdf）为本站会员（刘静）主动上传，淘文阁 - 分享文档赚钱的网站仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知淘文阁 - 分享文档赚钱的网站（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。