机器学习概论机器学习概论 (9).pdf
1The final topic:Introduction to Machine Learning overview introduction to machine learning:overview2Overview Basic concepts Machine learning approaches 2019/6/102I.Basic Concepts introduction to machine learning:overview4Whats machine learning?2019/6/103introduction to machine learning:overview5Whats machine learning?Learning=improving with experience at some taskT(Task)P(Performance)E(Experience)introduction to machine learning:overview7A generic system System1x2xNx1y2yMy12,.,Kh hh()12,.,Nx xx=x()12,.,Kh hh=h()12,.,Ky yy=yInput Variables:Hidden Variables:Output Variables:Control Parameterse,d2019/6/104II.Machine learning approachesintroduction to machine learning:overview9The branches of machine learningSupervised LearningUnsupervised LearningSemi-supervised LearningTransfer Learning2019/6/105SupervisedUnsupervisedInstances for learning(X,Y)pair,usually with human involvementX only,usually without humaninvolvementGoal of learningLearning relation between X and YLearning structure of XMeasure of successLoss functionNo ApplicationPrediction:X=input,Y=outputAnalysis:X=inputintroduction to machine learning:unsupervised learning10Supervised and Unsupervised LearningSupervisedUnsupervisedInstances for learning(X,Y)pair,usually with human involvementX only,usually without humaninvolvementGoal of learningLearning relation between X and YLearning structure of XMeasure of successLoss functionNo ApplicationPrediction:X=input,Y=outputAnalysis:X=inputintroduction to machine learning:unsupervised learning11Supervised and Unsupervised Learning2019/6/106SupervisedUnsupervisedInstances for learning(X,Y)pair,usually with human involvementX only,usually without humaninvolvementGoal of learningLearning relation between X and YLearning structure of XMeasure of successLoss functionNo ApplicationPrediction:X=input,Y=outputAnalysis:X=inputintroduction to machine learning:unsupervised learning12Supervised and Unsupervised Learningintroduction to machine learning:overview13SupervisedUnsupervisedInstances for learning(X,Y)pair,usually with human involvementX only,usually without humaninvolvementGoal of learningLearning relation between X and YLearning structureof XMeasure of successLoss functionNo ApplicationPrediction:X=input,Y=outputAnalysis:X=inputSupervised and Unsupervised Learning2019/6/107II.Machine learning approaches(part I)Supervised Learningintroduction to machine learning:overview15Supervised learningLearning ModulePrediction Modulennyyyxxx!2121?1+ny)|(XYPKnowledge1+nxY is simpleX is complex2019/6/108introduction to machine learning:overview16A1.Decision treelData effectively modeled via decision splits on attributesX3Y5-1+1-1noyesyesnoXY35+1-1-1introduction to machine learning:overview17Decision tree:use concepts/rules to represent hypothesisIntuitional,easy to get the explanation of the data by the learned hypothesisBut what if we cannot find the obvious rules for the observed data?Using statistical approachesUsing statistical approaches2019/6/109introduction to machine learning:overview18A2.Bayesian LearningCondition Resulte.g.pneumonia lung cancer?Hard to tell directlyReversed thinking (Result Causal)e.g.How many lung cancer patients have suffered from pneumonia?introduction to machine learning:overview19Bayesian LearningBayes theoremUse prior probability to inference posterior probabilityMax A Posterior,MAP,hMAP,极大后验假设Generally we want the most probable hypothesis given the training dataMaximum Likelihood,ML,hML,极大似然假设The smart man always learns the most from experiences if he knows p(h)ML vs.LSE(Least Square Error)Nave Bayes,NB,朴素贝叶斯Independent assumptionNB vs.MAPMinimum description length,MDL,最小描述长度Tradeoff:hypothesis complexity vs.errors by hMDL vs.MAP)()()|()|(DPhPhDPDhP=2019/6/1010introduction to machine learning:overview20A3.HMMWhat if the observed data is indirect evidence?hidden state existsProperty 1:Markov assumptionp(Xi|Xi-1X1)=p(Xi|Xi-1)Property 2:time invariance assumptionp(Xi+1|Xi)=p(Xj+1|Xj),for any i,jProperty 3:independent observation assumptionp(O1,.,OT|X1,.,XT)=p(Ot|Xt)oTo1otot-1ot+1ABAAABBBBintroduction to machine learning:overview21HMM basic Problem1 EstimationEstimation problem:Compute the probability of a given observation sequence p(|)Define forward variable,backward variableDynamic programming forward algorithm,O(N2T)111.()()tttijjoiNji a baa+=NiTiOP1)()|(aoTo1otot-1ot+1x1xt+1xTxtxt-12019/6/1011introduction to machine learning:overview22HMM basic Problem2 DecodingGiven an observation sequence,compute the most likely hidden statesequenceDynamic programming Viterbi algorithm,O(N2T))(1)(max)(1+=+tjoijtitbaijdd)(1)(maxarg)(1+=+tjoijtitbaijdy),.,.(max)(1111.11ttttxxtojxooxxPjt=-doTo1otot-1ot+1x1xt-1xtxt+1xTintroduction to machine learning:overview23What weve learned from HMMHMMSSSKKKSKSKAAAABBBWe use the special structure of this model to do a lot of neat math and solve problems that are otherwise not solvable.2019/6/1012A4.Kernel methods and Non-linear SVMintroduction to machine learning:overview24Maximize the marginDefine:the margin of a linear classifier as the width that the boundary could be increased by before hitting a data pointMaximize the marginNon-separable Caseintroduction to machine learning:overview25Minimizing training error2019/6/1013Non-linear SVMintroduction to machine learning:overview26Input space feature spaceNon-linear in low dim.linear Hyperplane in higher mon kernelsSoftwareLIBSVM http:/www.csie.ntu.edu.tw/cjlin/libsvmSVMlighthttp:/svmlight.joachims.orgintroduction to machine learning:overview27Previous learning approachesEstimate the characteristics(e.g.distribution)Make an assumption to the modelFind the optimal parameter Is there a learning approach that is NOT“model assumption+parameter estimation”?But sometimes,we know nothing before learning2019/6/1014introduction to machine learning:overview28A5.k-Nearest Neighbor(KNN)1-Nearest Neighbor3-Nearest NeighborP1P2P3P4P5P6P7P8PnP1P2P3P4P5P6P7P8PnlThinking is reminding,making analogiesl One takes the behavior of ones company“近朱者赤,近墨者黑”introduction to machine learning:overview29KNNMain assumptionAn effective distance metric existsNonparametricConceptually simple,yet can model any functionlMemory costlCPU costlFeature selection problemlIrrelevant features have negative impact on the distance metriclSensitive to representation2019/6/1015More on efficiency KD-Tree(Construction)We will keep around one additional piece of information at each node:The(tight)bounds of the points at or below this node.introduction to machine learning:overview30Each time a new closest node is found,we can update the distance bounds.More on efficiency KD-Tree(Query)introduction to machine learning:overview312019/6/1016Using the distance bounds and the bounds of the data below each node,we can prune parts of the tree that could NOT include the nearest neighbor.introduction to machine learning:overview32More on efficiency KD-Tree(Query)introduction to machine learning:overview33A memory based learner:4 factors1.A distance metricEuclidian/Scaled Euclidian/2.How many nearby neighbors to look at?1,k or all3.A weighting function(optional)wi=exp(-D(xi,query)2/Kw2)4.How to fit with the local points?Nearest neighbor,orVoting among K neighbors,orThe weighted average of the outputs predict=wiyi/wi2019/6/1017II.Machine learning approaches(part II)Unsupervised Learningintroduction to machine learning:overview35Unsupervised learningLearning ModulePrediction Modulenxxx!21?1+nx)(XPKnowledgeX is complexBuild a model or find useful representations of the input that can be used for decision making,predicting future inputs,efficiently communicating the inputs to another machine,etcFind patterns(discover the structure)in the data above and beyond what would be considered pure unstructured noise.2019/6/1018What are good clusters?introduction to machine learning:overview36The intra-cluster distance is smallThe inter-cluster distance is largeintroduction to machine learning:overview37Update the cluster means012345678910012345678910Update the cluster meansAssign each objects to most similar centerA6.K-Means012345678910012345678910012345678910012345678910012345678910012345678910012345678910012345678910K=2Arbitrarily choose K object as initial cluster centerreassignreassignDo loopUntil no change2019/6/1019introduction to machine learning:overview38012345678910012345678910Total Cost=200123456789100123456789K=2012345678910012345678910Assign each object to nearest medoidsRandomly select a non-medoid object,OrandomCompute total cost012345678910012345678910Cost=26Not swapSwapping O and OrandomIf quality is improved.Do loopUntil no change012345678910012345678910A7.K-Medoids10Arbitrary choose k object as initial medoidsintroduction to machine learning:overview39124567893A8.Hierarchical Clustering(Agglomerative)