书签分享收藏举报版权申诉 / 28

立即下载

当前位置：首页 > 教育专区 > 大学资料 > 机器学习概论机器学习概论 (7).pdf

机器学习概论机器学习概论 (7).pdf

上传人：刘静

文档编号：57971734

上传时间：2022-11-06

格式：PDF

页数：28

大小：6.21MB

( 4.5 )

《机器学习概论机器学习概论 (7).pdf》由会员分享，可在线阅读，更多相关《机器学习概论机器学习概论 (7).pdf（28页珍藏版）》请在淘文阁 - 分享文档赚钱的网站上搜索。

1、2019/4/191Welcome to Introduction to Machine Learning!2010.3.51*Images come from InternetCoffee TimeWhat makes a good experimental report?ByTA:ChenyangWang2019/4/192Topic 8:Support Vector Machine?Outline Introduction to machine learning:SVM4BackgroundLinear support vector machineKernel support vecto

2、r machineAppendixMin Zhang2019/4/193ClassificationIntroduction to machine learning:SVM5Pos:This skirt is so beautiful!Neg:It looks ugly on my body.Neural:It is pure cotton.Rock:Michael Jackson“Beat it”,Hip Hop:Eminem“Lose yourself”Blues:Muddy Waters“I cant be satisfied”Classication methodsIntroducti

3、on to machine learning:SVM6Decision treeattributes of instances are nominal dataobjective function are discreteK-nearest neighborinstances are points in the(e.g.Euclidean)spaceobjective function can be discrete or continuousSupport vector machineinstances are points in the(e.g.Euclidean)spaceobjecti

4、ve function can be discrete or continuous2019/4/194Background messageIntroduction to machine learning:SVM7The present form of support vector machine(SVM)was largely developed at AT&T Bell Laboratories by Vapnik and co-workers.Known as a maximum margin classier.Originally proposed for classification

5、and soon applied toregression and time series prediction.One of the most efficient supervised learning methods.Been used as strong baseline of text processing approaches.Outline Introduction to machine learning:SVM8BackgroundLinear support vector machineMax margin linear classifierDual problem formu

6、lationLinearly non-separable caseKernel support vector machineAppendix2019/4/195Problem Introduction to machine learning:SVM9Given a set of training samples!,$,!%,$%,!&,$&,!)*,$1,1,find a function-(!,/)to classify the samples,such thatwhere/denotes the parametersFor a testing sample!,we can predict

7、its label by sign-!,/-!,/=0 is called the separation hyperplaneLinear classifiersIntroduction to machine learning:SVM10Linear hyperplaneConsider the linearly separable case,there are infinite number of hyperplanes that can do the jobAny of these lines would be fine.but which is the best one?How woul

8、d you classify this data?2019/4/196Margin of a linear classierIntroduction to machine learning:SVM11Let two hyperplanes parallel to the separation hyperplane on the two sides of the separation hyperplane,respectively,move away from the separation hyperplane When they first hit two data points,respec

9、tively,the distance between them is called the margin of the linear classifier Margin(?):The width that the boundary could be increased by before hitting a data point.Maximum margin linear classifier?Introduction to machine learning:SVM12Definition:the linear classier with the maximum margin.Support

10、 vectors:?those datapointsthat the margin pushes up against2019/4/197Problem formulationIntroduction to machine learning:SVM13To formulate the margin,we further requires that for all samplesOrWe have introduced two additional hyperplanes!,#+%=1 parallel to the separation hyperplane!,#+%=0Problem for

11、mulationIntroduction to machine learning:SVM14What is the margin?The distance between the two new hyperplanes.What is the expression of the margin?Denote:?The minimum distance between the hyperplane!,#+%=1 and the origin by()The minimum distance between the hyperplane!,#+%=1 and the origin by(+The m

12、argin is|()(+|2019/4/198Problem formulationIntroduction to machine learning:SVM15How to calculate!and!#?Note%=!/#,where/#is the unit vector along the direction Since%is on the blue hyperplane,then!/#,+=1which follows!=-./0Similarly we obtain!#=-./0The margin is!#=#20Problem formulationIntroduction t

13、o machine learning:SVM16The optimization problemOr equivalently Although it seems that the margin is only decided by w,b also affects the margin implicitly via its impact on w in the constrain.2019/4/199Outline Introduction to machine learning:SVM17BackgroundLinear support vector machineMax margin l

14、inear classifierDual problem formulationLinearly non-separable caseKernel support vector machineAppendixDual problem formulation?Introduction to machine learning:SVM18Primal problemLagrange function?KKT conditions(Karush-Khun-Tucker,KKT?)2019/4/1910Dual problem formulationIntroduction to machine lea

15、rning:SVM19Substitute the results into!(#,%,&)and get(try by yourself)the dual problemmaxSupport vectorsSupport vectorsIntroduction to machine learning:SVM21According to the KKT condition!is nonzero only if#$,&+(=1,i.e.,&lies on the boundaries of the marginThese&s are support vectors(SV)Most!s are z

16、ero Then(xi,yi)has no impact on f(x)Sparse solution2019/4/1911Solution to the primal problem(by dual problem)Introduction to machine learning:SVM22Normal vectorBiasHyperplaneNote that!is sparseThe hyperplane is only determined by SVs!Summary so farIntroduction to machine learning:SVM23SVM in the lin

17、early separable caseMaximize marginSVs:their corresponding!0 Primal problem:Dual problem:2019/4/1912Outline Introduction to machine learning:SVM24BackgroundLinear support vector machineMax margin linear classifierDual problem formulationLinearly non-separable caseKernel support vector machineAppendi

18、xLinearly non-separable caseIntroduction to machine learning:SVM25Recall in the linearly separable caseEnsuring zero training classification errorIn the non-separable case,there must be errors.We minimize!as well as the training classification error!where#0is a constant to balance the two terms2019/

19、4/1913!011-12#/%!Loss functions?0/1 loss and Hinge lossIntroduction to machine learning:SVM26&()*!Recall a correct predictionDefineFor each sample+0/1 loss:Hinge loss:PS:separation hyperplane is,+/=0,i.e.,!=0Linearly penalize!1More loss functionsIntroduction to machine learning:SVM27Three common los

20、s functions to replace 0/1 loss:Hinge lossExponential lossLogistic loss2019/4/1914New formulationwhereIntroduce slack variables(?)!0.It becomes Formulation with hinge lossIntroduction to machine learning:SVM28Minimizing Hinge lossmax(0,1 ,)is equivalent to minimizing!subject to!1 ,!0Compare with the

21、 separable case:New variableSoft marginIntroduction to machine learning:SVM29Still want to find the maximum margin hyperplane,but this time:We allow some training examples to be misclassifiedWe allow some training examples to fall within the margin region2019/4/1915Soft marginIntroduction to machine

22、 learning:SVM30For!=0,the data point falls on the boundaries of the region of separation or outside the region of separation and on the right side of the decision surface.For 0 1,the data point falls on the wrong side of the separating hyperplane and introduce a wrong decision.Soft marginIntroductio

23、n to machine learning:SVM31The positive constant C controls the balance between large margin and small misclassification errorStructure risk(?)vs.empirical risk(?large C:prefer small errorsmall C:prefer large margin2019/4/1916Dual problemIntroduction to machine learning:SVM32The dual problem in the

24、separable caseThe dual problem in the non-separable case!=0:non-SVs!0:SVs(SVs also include those inside the region and those falls on the wrong side.)Solutions to the primal problem(by dual problem)Introduction to machine learning:SVM33Normal vectorBiasHyperplaneNote that!is sparse2019/4/1917Summary

25、 so farIntroduction to machine learning:SVM34SVM in the linearly separable casePrimal problem and dual problemSVM in the linearly non-separable case Primal problem:Dual problem:Example:Image classification with LLC and SVMIntroduction to machine learning:SVM35Dataset:Caltech1019144 images102 categor

26、iesPreprocessingConvert to gray scaleRescaled such that the longerside was 120 pixelsExtract features for each image with LLC(LowLevel Content)Train and test with SVMTest ResultsTrained with 15 images per category:70.16%Trained with 30 images per category:73.44%2019/4/1918Outline Introduction to mac

27、hine learning:SVM36BackgroundLinear support vector machineMax margin linear classifierDual problem formulationLinearly non-separable caseKernel support vector machineFrom input space to feature spaceKernel functions and constructionsKernel learning in generalAppendixFeature spaceIntroduction to mach

28、ine learning:SVM37Input space to feature spaceThe samples!s are linear non-separable in the input space but may be linearly separable in the feature space!#$+!$1Same points in!#$,!$,2!#!$2D points in!#,!$2019/4/1919Problems in the feature spaceIntroduction to machine learning:SVM38Input spaceFeature

29、 spaceChange!to(!)Primal problemDual problemChange!to(!)Linearly non-separable caseSolutions to the primal problem(by dual problem)Introduction to machine learning:SVM39Normal vectorBiasHyperplane(to be continued)2019/4/1920Outline Introduction to machine learning:SVM40BackgroundLinear support vecto

30、r machineMax margin linear classifierDual problem formulationLinearly non-separable caseKernel support vector machineFrom input space to feature spaceKernel functions and constructionsKernel learning in generalAppendixKernel trickIntroduction to machine learning:SVM41From previous slides,for solving

31、 the dual problem and obtaining the separation hyperplane in the feature space,what we only need to know is#,%instead of(#)and(%),individually.If we have a function)(#,%),which is equal to#,%then we do not need to represent the features explicitly)(#,%)is called the kernel function2019/4/1921Usage o

32、f the kernel functionIntroduction to machine learning:SVM42The dual problemDecision hyperplanewhere Mercers theoremIntroduction to machine learning:SVM43If we have a function!(#,%),which is equal to#,%then we do not need to compute#Is it possible?Such kernel functions must guarantee that there exist

33、s corresponding#Yes!Mercers theorem:There exists a mapping and an expansionif and only if,for any*(#)such that if *#,-#is finite then How?2019/4/1922Commonly used kernelsIntroduction to machine learning:SVM44Homogeneous polynomialsInhomogeneous polynomialsGaussian KernelSigmoid KernelAn exampleIntro

34、duction to machine learning:SVM45Consider the polynomial kernelLet!=2(dim of$and%)and&=2,thenThere exists a mappingProof:Neither the mapping nor the feature space is unique2019/4/19233 ways to construct kernelsIntroduction to machine learning:SVM461.Choose a feature function()then construct kernel2.

35、Choose a valid kernel without constructing the function()explicitlyMercers theorem:There exists a mapping and an expansionif and only if,for any%(&)such that if%&()&is finite then 3 ways to construct kernels(cont.)Introduction to machine learning:SVM473.Build new kernels from simple kernelsGiven val

36、id kernels!($,$)and!($,$),the following new kernels will also be valid:where)0 is a constant,()is any function,.is a polynomial with nonnegative coefficients,and/is a symmetric positive semidenite matrix2019/4/1924Example:construct the Gaussian kernelIntroduction to machine learning:SVM48Gaussian ke

37、rnelNotice that Since,is a kernel,according to the rules in the previous slide the Gaussian function is a valid kernelNote that the feature vector that corresponds to the Gaussian kernel has infinite dimensionalitySoftwareIntroduction to machine learning:SVM49Libsvmhttp:/www.csie.ntu.edu.tw/cjlin/li

38、bsvm/Liblinearhttp:/www.csie.ntu.edu.tw/cjlin/liblinear/SVMlighthttp:/svmlight.joachims.org/2019/4/1925Outline Introduction to machine learning:SVM50BackgroundLinear support vector machineMax margin linear classifierDual problem formulationLinearly non-separable caseKernel support vector machineFrom

39、 input space to feature spaceKernel functions and constructionsKernel learning in generalAppendixKernel learning in generalIntroduction to machine learning:SVM51Formal denitionA function!:#$#$#is called a kernel on#$if there is some function:#$(such thatPrincipleIf any learning method involves*,*we

40、can substitute it with!(*,*),we then work in the feature space(induced by()The mapping()If =*then!(*,*)is a linear kernelWe do not need to compute the function()explicitly2019/4/1926Various methodsIntroduction to machine learning:SVM52Kernel SVMKernel Fisher discriminantKernel logistic regressionKer

41、nel linear and ridge regressionKernel SVD or PCASummary so farIntroduction to machine learning:SVM53Linear support vector machineLinearly separable problem:maximize marginPrimal problem and dual problemLinearly non-separable problem:maximize margin and minimize classification error Primal problem an

42、d dual problemKernel support vector machineMap to the feature space:#$&Kernel trick:(,*=(,*3 ways to construct kernels2019/4/1927Outline Introduction to machine learning:SVM54BackgroundLinear support vector machineMax margin linear classifierDual problem formulationLinearly non-separable caseKernel

43、support vector machineFrom input space to feature spaceKernel functions and constructionsKernel learning in generalAppendixPreliminaries on optimization theoryFurther readingPreliminaries on optimization theoryIntroduction to machine learning:SVM55The optimization problemwhere!#$and%!,(!)are differe

44、ntiableLagrange functionKKT conditions2019/4/1928Preliminaries on optimization theoryIntroduction to machine learning:SVM56The dual functionThe dual problemProperties!%&If%&,)*&are all convex,and there exists+&such that)+&0,then we have!=%&and KKT conditions are both sufficient and necessary optimal

45、 conditionsFurther readingIntroduction to machine learning:SVM57J.C.Burges,A tutorial on support vector machines for pattern recognition.Data Mining and Knowledge Discovery,2(2):121-167,1998Joachims,T.Text classification with support vector machines:Learning with many relevantfeatures.In Proc.Of the

46、 10thEuropean Conference on Machine Learning(ECML),137-142,Germany,1998./Vapnik,V.N.An overview of statistical learning theory.IEEETransactions on NeuralNetworks,10(5):988-999.Alex Smola and Bernhard Schoelkopf,A tutorial on support vector regression.Statistics andComputing.14(3):199-222,2004.J.Platt,Fast training of support vector machines using sequential minimal optimization.In B.Scholkopf,C.J.C.Burges,and A.J.Smola,editors,Advances in Kernel Methods Support Vector Learning,pages 185-208,Cambridge,MA,1999.MIT Press.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

8 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 机器学习概论机器学习概论 7 机器学习概论

淘文阁 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：机器学习概论机器学习概论 (7).pdf
链接地址：https://www.taowenge.com/p-57971734.html