书签分享收藏举报版权申诉 / 62

立即下载

当前位置：首页 > 教育专区 > 大学资料 > 机器学习概论机器学习概论 (5).pdf

机器学习概论机器学习概论 (5).pdf

上传人：刘静

文档编号：57971199

上传时间：2022-11-06

格式：PDF

页数：62

大小：6.56MB

( 4.5 )

《机器学习概论机器学习概论 (5).pdf》由会员分享，可在线阅读，更多相关《机器学习概论机器学习概论 (5).pdf（62页珍藏版）》请在淘文阁 - 分享文档赚钱的网站上搜索。

1、Welcome to Introduction to Machine Learning!2010.3.51*Images come from InternetCoffee TimeACMA.M.TuringAward 2018Announced on Mar.27,20193Yann LeCunGeoffrey E HintonYoshua Bengiohttps:/amturing.acm.orgA.M Turing Awards 20184“For conceptual and engineering breakthroughs that have made deep neural net

2、works a critical component of computing.”Bengio:Professor at the University of Montreal and Scientific Director at Mila,Quebecs Artificial Intelligence InstituteHinton:VP and Engineering Fellow of Google,Chief Scientific Adviser of The Vector Institute,and University Professor Emeritus at the Univer

3、sity of TorontoLeCun:Professor at New York University and VP and Chief AI Scientist at Facebook.https:/amturing.acm.orgComments to their contribution5Working independently and togetherHinton,LeCun and Bengio developed conceptual foundations for the field,Identified surprising phenomena through exper

4、imentsContributed engineering advances that demonstrated the practical advantages of deep neural networksIn recent years,deep learning methods have been responsible for astonishing breakthroughs in computer vision,speech recognition,natural language processing,and robotics among other applicationsht

5、tps:/amturing.acm.orgSelect Technical Select Technical Accomplishments:Accomplishments:Geoffrey HintonGeoffrey Hinton6Backpropagation:1986,“Learning Internal Representations by Error Propagation,”David Rumelhart and Ronald Williams,Hinton BP algo.allowed NNs to discover their own internal representa

6、tions of datamaking it possible to use NNs to solve problems that had previously been thought to be beyond their reach.The BP algo.is standard in most neural networks today.Boltzmann Machines:1983,Terrence Sejnowski and Hinton One of the first NNs capable of learning internal representations in neur

7、ons that were not part of the input or output.Improvements to convolutional neural networks:2012,with his students,Alex Krizhevsky and Ilya Sutskever,and Hinton Improved CNN using rectified linear neurons and dropout regularization.In ImageNet competition,Hinton and his students(AlexNet)almost halve

8、d the error ratefor object recognition and reshaped the computer vision field.Select Technical AccomplishmentsSelect Technical Accomplishments:YoshuaYoshua BengioBengio7Probabilistic models of sequences:In the 1990s,Bengio combined NNs with probabilistic models of sequences,such as HMM.A system used

9、 by AT&T/NCR for reading handwritten checks,a pinnacle of NN research in the 1990sModern DL speech recognition sys.are extending these concepts.High-dimensional word embeddings and attention:In 2000,the landmark paper,“A Neural Probabilistic Language Model”Introduced high-dimension word embeddings a

10、s a representation of word meaning.Had a huge and lasting impact on NLP tasks including MT,QA,and visual QA.His group also introduced a form of attention mechanism which led to breakthroughs in MT and form a key component of sequential processing with deep learning.Generative adversarial networks:Si

11、nce 2010,Generative Adversarial Networks(GANs),Ian Goodfellow,BengioA revolution in computer vision and computer graphics.Computers can actually create original images,reminiscent of the creativity that is considered a hallmark of human intelligence.Select Technical Accomplishments:Select Technical

12、Accomplishments:Yann Yann LeCunLeCun8Convolutional neural networks:1980s,developed CNN(Uni.ofToronto and Bell Labs)A foundational principle in the field,which,among other advantages,have been essential in making DL more efficient.Today,CNN are an industry standard in computer vision,as well as in sp

13、eech recognition,speech synthesis,image synthesis,and NLP.Used in a wide variety of applications,including autonomous driving,medical image analysis,voice-activated assistants,and information filtering.Improving BP alg.s:An early version of the BP alg.(backprop),gave a clean derivation based on vari

14、ational principles.Speed up BP alg.s:two simple methods to accelerate learning time.Broadening the vision of NNs:Developing a broader vision for NNs as a computational model for a wide range of tasksIntroducing in early work a number of concepts now fundamental in AILearning hierarchical feature rep

15、resentation;(Together with Lon Bottou)Learning systems can be built as complex networks of modules where BP is performed through automatic differentiation.Deep learning architectures can manipulate structured data,such as graphs从左至右：LeCun、Hinton、Bengio、吴恩达9Topic 6.ML Theory-I:Evaluating Hypotheses学习

16、理论 I:假设的评估问题Min Zhang Introduction to Machine Learning:Decision Tree Learning11Review:Inductive learning hypothesisMuch of the learning involves acquiring general concept from specific training examples.Inductive learning algorithms can at best guarantee that the output hypothesis fits the target co

17、ncept over the training data.Notice:over-fitting problemIntroduction to Machine Learning:Decision Tree Learning12Review:Inductive learning hypothesisThe Inductive Learning Hypothesis:Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will

18、also approximate the target function well over unobserved examples.(任一假设若在足够大的训练样例集中很好地逼近目标函数，它也能在未见实例中很好地逼近目标函数)introduction to machine learning:Theory(I)Evaluating Hypotheses13Motivation Question 1Performance estimationGiven the observed accuracy of a hypothesis over a limited sample of data how w

19、ell does it estimate the accuracy over additional data?introduction to machine learning:Theory(I)Evaluating Hypotheses14Motivation Question 2h1 outperforms h2 over some sample of dataHow probable is it that h1 is better in general?introduction to machine learning:Theory(I)Evaluating Hypotheses15Moti

20、vation Question 3When data is limitedwhat is the best way to use this data to both learn a hypothesis and estimate its accuracy?The mathematical study of the likelihood and probability of events occurring based on known information and inferred by taking a limited number of samples.introduction to m

21、achine learning:Theory(I)Evaluating Hypotheses16Background Knowledge on Statisticsintroduction to machine learning:Theory(I)Evaluating Hypotheses17Basics of Sampling TheoryBernoulli experimentsOnly 2 outputs:success probability:p,fail probability:q=1-pUse random variable X to record the number of su

22、ccesslBinomial Distribution:lToss a coin:probability of heads side up p,toss n times,observed heads up r timeslIf X B(n,p)then Pr(X=r)=P(r)introduction to machine learning:Theory(I)Evaluating Hypotheses18Binomial distributionintroduction to machine learning:Theory(I)Evaluating Hypotheses19Where the

23、Binomial Distribution AppliesTwo possible outcomes(success and failure)(Y=0 or Y=1)The probability of success is the same on each trialPr(Y=1)=p,where p is a constantThere are n independent trailsRandom variables Y1,Yn，iid(independent identically distribution)R:random variable,count of Yiwhere Yi=1

24、on n trails,Pr(R=r)Binomial distributionMean(expected value):ER,Binomial distribution:=npVariance:VarR=E(R-ER)2,s2(Standard deviations)Binomial distribution:s2=np(1-p)introduction to machine learning:Theory(I)Evaluating Hypotheses20Discussions on Question 1introduction to machine learning:Theory(I)E

25、valuating Hypotheses21Review Question 1Performance estimationGiven the observed accuracy of a hypothesis over a limited sample of data how well does it estimate the accuracy over additional data?introduction to machine learning:Theory(I)Evaluating Hypotheses22Estimating Hypothesis Accuracy:Define Pr

26、oblemGiven:A hypothesis h and a data sample containing n examplesDrawn at random according to the distribution DSample Error errorSQuestion1.What is the best estimate of the accuracy of h over future instances drawn from the same distribution?True Error errorD2.What is the probable error of the accu

27、racy estimate?introduction to machine learning:Theory(I)Evaluating Hypotheses23Probability that r of n random samples are misclassified Binomial distributionpherrorD=)(Estimating Hypothesis Accuracy answer to Q1.1,nprE=nrherrors=)(nherrorherrornpnpnDDrherrors)(1)()1()(-=-=ss=nrherrorsss)(nherrorherr

28、orSS)(1)(-pnnpnrE=)(herrorES)(herrorD=Back to Q1.1 What is the best estimate of the accuracy of h over future instances drawn from the same distribution?n=100,r=1212%3.2%n=25,r=312%6.5%introduction to machine learning:Theory(I)Evaluating Hypotheses24Two important properties of estimatorEstimation bi

29、asIf S is training set,errorS(h)is optimistically biased bias EerrorS(h)-errorD(h)For unbiased estimate(bias=0),h and S must be chosen independently Dont test on training set!Estimation varianceEven with unbiased S,errorS(h)may still vary from errorD(h)E.g.recall previous examples(3.2%vs.6.5%)Should

30、 choose the unbiased estimator with least varianceintroduction to machine learning:Theory(I)Evaluating Hypotheses25Q1.2 What is the probable error of the accuracy estimate?(How well does errorS(h)estimate errorD(h)?)Sampling theory:confidence interval(置信区间)Definition:An N%confidence interval for som

31、e parameter p is an interval that is expected with probability N%to contain p.(N%:confidence degree)参数p 的N%置信区间是一个以N%的概率包含p 的区间,N%:置信度Estimating Hypothesis Accuracy Q1.2introduction to machine learning:Theory(I)Evaluating Hypotheses26Confidence intervalExample of confidence intervalSuppose you know

32、nothing about a girl,therefore her age is a random variable X for you.When you see her photo,you guess her age between 12 to 18 with confidence level 90%To get more confidence level(e.g.99.9%),the interval has to be larger(e.g.3,60)How to get confidence interval?Bad news:Hard with Binomial Distribut

33、ionGood news:Easy with Normal DistributionObtained with area(integral)of normal distributionintroduction to machine learning:Theory(I)Evaluating Hypotheses27Normal distributionProbability density function of normal dist.introduction to machine learning:Theory(I)Evaluating Hypotheses28Normal Dist.&Bi

34、nomial Dist.For sufficiently large sample sizesThe binomial distribution can be closely approximated byThe Normal distributionRule of thumb:n30,np(1-p)5introduction to machine learning:Theory(I)Evaluating Hypotheses29Confidence Interval of Normal Distribution80%of area(the measured value y)lies in N

35、%of area(the measured value y)lies inEquivalently,the meanwill fall in the following interval N%of the timey zNss28.1sNzintroduction to machine learning:Theory(I)Evaluating Hypotheses30Estimating Hypothesis Accuracy the answer to Q1.2More correctly,ifS contains n examples,drawn independently of h an

36、d each other,n=30Thenintroduction to machine learning:Theory(I)Evaluating Hypotheses31More correctly,ifS contains n examples,drawn independently of h and each other,n=30ThenEstimating Hypothesis Accuracy the answer to Q1.2introduction to machine learning:Theory(I)Evaluating Hypotheses32More correctl

37、y,ifS contains n examples,drawn independently of h and each other,n=30ThenEstimating Hypothesis Accuracy the answer to Q1.2introduction to machine learning:Theory(I)Evaluating Hypotheses33More details:One-sided boundsYield upper or lower error boundsWe know the probability that errorD(h)lies in L,UT

38、hen whats the probability that errorD(h)is less than U?Symmetry of Normal Distributionintroduction to machine learning:Theory(I)Evaluating Hypotheses34f+1+20-1-2+1.96-1.96zCV=+1.96More details:One-sided boundszCV=+1.96f+1+20-1-2+1.96N%N%=97.5%N%N%=95%zCV=+1.645N%N%=95%f+1+20-1-2+1.645introduction to

39、 machine learning:Theory(I)Evaluating Hypotheses35Recall Question 1Performance estimationGiven the observed accuracy of a hypothesis over a limited sample of data how well does this estimate its accuracy over additional data?introduction to machine learning:Theory(I)Evaluating Hypotheses36Overview:A

40、nswers to Question 1Problem setting:S:n random independent samples,and independent withhypothesis hn=30&h with r errorsTrue error errorDlies in the following interval with N%confidence:introduction to machine learning:Theory(I)Evaluating Hypotheses37More Information on Deriving Confidence Intervalsi

41、ntroduction to machine learning:Theory(I)Evaluating Hypotheses38General Approach for deriving Confidence IntervalsIn generalIdentify the parameter p to estimate,e.g.errorD(h)Define estimator Y(bias,variance),e.g.errorS(h)Desirable:minimum variance,unbiased estimatorDetermine distribution D governing

42、 Y(including mean&variance)Determine N%confidence interval(L.U)Could have L=-or U=E.g.Use table of znvalues(for normal distribution)Applied later to other problemsintroduction to machine learning:Theory(I)Evaluating Hypotheses39General Approach for deriving Confidence IntervalsIn generalIdentify the

43、 parameter p to estimate,e.g.errorD(h)Define estimator Y(bias,variance),e.g.errorS(h)Desirable:minimum variance,unbiased estimatorDetermine distribution D governing Y(including mean&variance)Determine N%confidence interval(L.U)Could have L=-or U=E.g.Use table of znvalues(for normal distribution)Appl

44、ied later to other problemsintroduction to machine learning:Theory(I)Evaluating Hypotheses40Central Limit TheoremSimplifies attempts to define confidence intervals.Problem settingIndependent,identically distributed(iid)random variable Y1,.,Yn,unknown distribution,with mean and finite variance 2Estim

45、ating mean:Central Limit Theoremapproaches a normal distribution(n)With mean,and variance s2/nCan be normalized to the normal dist.with=0,s=1introduction to machine learning:Theory(I)Evaluating Hypotheses41Central Limit Theorem Distribution of sample mean is knownalthough distribution of Yiis notcan

46、 be used to determine mean&variance of YiGives basis to approximatingDistribution of estimatorsThat are means of some sampleY!Introduction to Machine Learning:Decision Tree Learning42Two ways of avoid over-fitting for D-TreeI.Stop growing when data split not statistically significant(pre-pruning)II.

47、Grow full tree,then post-pruningFor option II:How to select“best”tree？Measure performance over training data(statistical pruning)Confidence levelMeasure performance over separate validation data setApplication:Avoid over-fitting in DTreeDecision Tree Pruning based on Confidence Intervals(as in C4.5)

48、Advantage:It allows all of the available labeled data to be used for training.Key idea:calculate a confidence interval for the error rate.True error errorDlies in the following interval with N%confidence:In order to decide whether to replace a near-leaf node and its child leaves by a single leaf nod

49、e,C4.5 compares the upper limits of the error confidence intervals for the two treesFor the unpruned tree,the upper error estimate is calculated as a weighted average over its child leaves.introduction to machine learning:Theory(I)Evaluating Hypotheses43Decision Tree Pruning based on Confidence Inte

50、rvals(as in C4.5)e.g.For health plan node:(75%confidence,z=0.69)The average estimated upper error rate for the unpruned tree=none:errs=2/6,n=6,errtupper bound:0.46=half:errs=1/2,n=2,errtupper bound:0.74=full:errs=2/6,n=6,errtupper bound:0.46Weighted average upper error rate:0.50If the node“health pl

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

8 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 机器学习概论机器学习概论 5 机器学习概论

淘文阁 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：机器学习概论机器学习概论 (5).pdf
链接地址：https://www.taowenge.com/p-57971199.html