书签分享收藏举报版权申诉 / 44

立即下载

当前位置：首页 > 教育专区 > 小学资料 > 第4课数据分类和预测.ppt

第4课数据分类和预测.ppt

上传人：豆****

文档编号：60617223

上传时间：2022-11-17

格式：PPT

页数：44

大小：509KB

( 4.5 )

《第4课数据分类和预测.ppt》由会员分享，可在线阅读，更多相关《第4课数据分类和预测.ppt（44页珍藏版）》请在淘文阁 - 分享文档赚钱的网站上搜索。

1、第4课数据分类和预测 Still waters run deep.流静水深流静水深,人静心深人静心深 Where there is life,there is hope。有生命必有希望。有生命必有希望内容提纲nWhat is classification?What is prediction?nIssues regarding classification and predictionnClassification by decision tree inductionnBayesian ClassificationnPredictionnSummarynReferencenClassifica

2、tion predicts categorical class labels(discrete or nominal)classifies data(constructs a model)based on the training set and the values(class labels)in a classifying attribute and uses it in classifying new datanPrediction models continuous-valued functions,i.e.,predicts unknown or missing values nTy

3、pical applicationsCredit approvalTarget marketingMedical diagnosisFraud detectionI.Classification vs.PredictionClassificationA Two-Step Process nModel construction:describing a set of predetermined classesEach tuple/sample is assumed to belong to a predefined class,as determined by the class label a

4、ttributeThe set of tuples used for model construction is training setThe model is represented as classification rules,decision trees,or mathematical formulaenModel usage:for classifying future or unknown objectsEstimate accuracy of the modelnThe known label of test sample is compared with the classi

5、fied result from the modelnAccuracy rate is the percentage of test set samples that are correctly classified by the modelnTest set is independent of training set,otherwise over-fitting will occurIf the accuracy is acceptable,use the model to classify data tuples whose class labels are not knownClass

6、ification Process(1):Model ConstructionTrainingDataClassificationAlgorithmsIF rank=professorOR years 6THEN tenured=yes Classifier(Model)Classification Process(2):Use the Model in PredictionClassifierTestingDataUnseen Data(Jeff,Professor,4)Tenured?Supervised vs.Unsupervised LearningnSupervised learni

7、ng(classification)Supervision:The training data(observations,measurements,etc.)are accompanied by labels indicating the class of the observationsNew data is classified based on the training setnUnsupervised learning(clustering)The class labels of training data is unknownGiven a set of measurements,o

8、bservations,etc.with the aim of establishing the existence of classes or clusters in the dataII.Issues Regarding Classification and Prediction(1):Data PreparationnData cleaningPreprocess data in order to reduce noise and handle missing valuesnRelevance analysis(feature selection)Remove the irrelevan

9、t or redundant attributesnData transformationGeneralize and/or normalize dataIssues regarding classification and prediction(2):Evaluating classification methodsnAccuracy:classifier accuracy and predictor accuracynSpeed and scalabilitytime to construct the model(training time)time to use the model(cl

10、assification/prediction time)nRobustnesshandling noise and missing valuesnScalabilityefficiency in disk-resident databases nInterpretabilityunderstanding and insight provided by the modelnOther measures,e.g.,goodness of rules,such as decision tree size or compactness of classification rulesIII.Decis

11、ion Tree Induction:Training DatasetThis follows an example of Quinlans ID3(Playing Tennis)Output:A Decision Tree for“buys_computer”age?overcaststudent?credit rating?noyesfairexcellent40nonoyesyesyes30.40Algorithm for Decision Tree InductionnBasic algorithm(a greedy algorithm)Tree is constructed in a

12、 top-down recursive divide-and-conquer mannerAt start,all the training examples are at the rootAttributes are categorical(if continuous-valued,they are discretized in advance)Examples are partitioned recursively based on selected attributesTest attributes are selected on the basis of a heuristic or

13、statistical measure(e.g.,information gain)nConditions for stopping partitioningAll samples for a given node belong to the same classThere are no remaining attributes for further partitioning majority voting is employed for classifying the leafThere are no samples leftAttribute Selection Measure:Info

14、rmation Gain(ID3/C4.5)nSelect the attribute with the highest information gainnS contains si tuples of class Ci for i=1,m ninformation measures info required to classify any arbitrary tuplenentropy of attribute A with values a1,a2,avninformation gained by branching on attribute AAttribute Selection b

15、y Information Gain ComputationgClass P:buys_computer=“yes”gClass N:buys_computer=“no”gI(p,n)=I(9,5)=0.940gCompute the entropy for age:means“age split-pointExtracting Classification Rules from TreesnRepresent the knowledge in the form of IF-THEN rulesnOne rule is created for each path from the root t

16、o a leafnEach attribute-value pair along a path forms a conjunctionnThe leaf node holds the class predictionnRules are easier for humans to understandnExampleIF age=“=30”AND student=“no”THEN buys_computer=“no”IF age=“40”AND credit_rating=“excellent”THEN buys_computer=“yes”IF age=“=30”AND credit_rati

17、ng=“fair”THEN buys_computer=“no”Avoid Overfitting in ClassificationnOverfitting:An induced tree may overfit the training data Too many branches,some may reflect anomalies due to noise or outliersPoor accuracy for unseen samplesnTwo approaches to avoid overfitting Prepruning:Halt tree construction ea

18、rlydo not split a node if this would result in the goodness measure falling below a thresholdnDifficult to choose an appropriate thresholdPostpruning:Remove branches from a“fully grown”treeget a sequence of progressively pruned treesnUse a set of data different from the training data to decide which

19、 is the“best pruned tree”Approaches to Determine the Final Tree SizenSeparate training(2/3)and testing(1/3)setsnUse cross validationnUse all the data for trainingbut apply a statistical test(e.g.,chi-square)to estimate whether expanding or pruning a node may improve the entire distributionnEnhanceme

20、nts to Basic Decision Tree InductionnAllow for continuous-valued attributesDynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervalsnHandle missing attribute valuesAssign the most common value of the attributeAssign probability t

21、o each of the possible valuesnAttribute constructionCreate new attributes based on existing ones that are sparsely representedThis reduces fragmentation,repetition,and replicationClassification in Large DatabasesnClassificationa classical problem extensively studied by statisticians and machine lear

22、ning researchersnScalability:Classifying data sets with millions of examples and hundreds of attributes with reasonable speednWhy decision tree induction in data mining?relatively faster learning speed(than other classification methods)convertible to simple and easy to understand classification rule

23、scan use SQL queries for accessing databasescomparable classification accuracy with other methodsScalable Decision Tree Induction MethodsnSLIQ(EDBT96 Mehta et al.)builds an index for each attribute and only class list and the current attribute list reside in memorynSPRINT(VLDB96 J.Shafer et al.)cons

24、tructs an attribute list data structure nPUBLIC(VLDB98 Rastogi&Shim)integrates tree splitting and tree pruning:stop growing the tree earliernRainForest (VLDB98 Gehrke,Ramakrishnan&Ganti)separates the scalability aspects from the criteria that determine the quality of the treebuilds an AVC-list(attri

25、bute,value,class label)Presentation of Classification ResultsVisualization of a Decision Tree in SGI/MineSet 3.0Interactive Visual Mining by Perception-Based Classification(PBC)IV.Bayesian Classification:Why?nProbabilistic learning:Calculate explicit probabilities for hypothesis,among the most pract

26、ical approaches to certain types of learning problemsnIncremental:Each training example can incrementally increase/decrease the probability that a hypothesis is correct.Prior knowledge can be combined with observed data.nProbabilistic prediction:Predict multiple hypotheses,weighted by their probabil

27、itiesnStandard:Even when Bayesian methods are computationally intractable,they can provide a standard of optimal decision making against which other methods can be measuredBayesian Theorem:BasicsnLet X be a data sample whose class label is unknownnLet H be a hypothesis that X belongs to class C nFor

28、 classification problems,determine P(H|X):the probability that the hypothesis holds given the observed data sample XnP(H):prior probability of hypothesis H(i.e.the initial probability before we observe any data,reflects the background knowledge)nP(X):probability that sample data is observednP(X|H):p

29、robability of observing the sample X,given that the hypothesis holdsBayesian TheoremnGiven training data X,posteriori probability of a hypothesis H,P(H|X)follows the Bayes theoremnInformally,this can be written as posteriori=likelihood x prior/evidencenMAP(maximum posteriori)hypothesisnPractical dif

30、ficulty:require initial knowledge of many probabilities,significant computational costNave Bayes Classifier nA simplified assumption:attributes are conditionally independent:nThe product of occurrence of say 2 elements x1 and x2,given the current class is C,is the product of the probabilities of eac

31、h element taken separately,given the same class P(y1,y2,C)=P(y1,C)*P(y2,C)nNo dependence relation between attributes nGreatly reduces the computation cost,only count the class distribution.nOnce the probability P(X|Ci)is known,assign X to the class with maximum P(X|Ci)*P(Ci)Training datasetClass:C1:

32、buys_computer=yesC2:buys_computer=noData sample X=(age=30,Income=medium,Student=yesCredit_rating=Fair)Nave Bayesian Classifier:An ExamplenCompute P(X|Ci)for each classP(age=“30”|buys_computer=“yes”)=2/9=0.222 P(age=“30”|buys_computer=“no”)=3/5=0.6 P(income=“medium”|buys_computer=“yes”)=4/9=0.444 P(i

33、ncome=“medium”|buys_computer=“no”)=2/5=0.4 P(student=“yes”|buys_computer=“yes)=6/9=0.667 P(student=“yes”|buys_computer=“no”)=1/5=0.2 P(credit_rating=“fair”|buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair”|buys_computer=“no”)=2/5=0.4 X=(age=30,income=medium,student=yes,credit_rating=fair)P(X|Ci)

34、:P(X|buys_computer=“yes”)=0.222 x 0.444 x 0.667 x 0.0.667=0.044 P(X|buys_computer=“no”)=0.6 x 0.4 x 0.2 x 0.4=0.019 P(X|Ci)*P(Ci):P(X|buys_computer=“yes”)*P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”)*P(buys_computer=“no”)=0.007 Therefore,X belongs to class“buys_computer=yes”Nave Bayesian Cla

35、ssifier:CommentsnAdvantages Easy to implement Good results obtained in most of the casesnDisadvantagesAssumption:class conditional independence,therefore loss of accuracyPractically,dependencies exist among variables E.g.,hospitals:patients:Profile:age,family history etc Symptoms:fever,cough etc.,Di

36、sease:lung cancer,diabetes etc Dependencies among these cannot be modeled by Nave Bayesian ClassifiernHow to deal with these dependencies?Bayesian Belief Networks Bayesian Belief NetworksnBayesian belief network allows a subset of the variables conditionally independentnA graphical model of causal r

37、elationshipsRepresents dependency among the variables Gives a specification of joint probability distribution XYZPqNodes:random variablesqLinks:dependencyqX,Y are the parents of Z,and Y is the parent of PqNo dependency between Z and PqHas no loops or cyclesBayesian Belief Network:An ExampleFamilyHis

38、toryLungCancerPositiveXRaySmokerEmphysemaDyspneaLCLC(FH,S)(FH,S)(FH,S)(FH,S)0.80.20.50.50.70.30.10.9Bayesian Belief NetworksThe conditional probability table for the variable LungCancer:Shows the conditional probability for each possible combination of its parentsLearning Bayesian NetworksnSeveral c

39、asesGiven both the network structure and all variables observable:learn only the CPTsNetwork structure known,some hidden variables:method of gradient descent,analogous to neural network learningNetwork structure unknown,all variables observable:search through the model space to reconstruct graph top

40、ology Unknown structure,all hidden variables:no good algorithms known for this purposenD.Heckerman,Bayesian networks for data miningV.What Is Prediction?n(Numerical)prediction is similar to classificationconstruct a modeluse model to predict continuous or ordered value for a given inputnPrediction i

41、s different from classificationClassification refers to predict categorical class labelPrediction models continuous-valued functionsnMajor method for prediction:regressionmodel the relationship between one or more independent or predictor variables and a dependent or response variablenRegression ana

42、lysisLinear and multiple regressionNon-linear regressionOther regression methods:generalized linear model,Poisson regression,log-linear models,regression treesLinear Regression nLinear regression:involves a response variable y and a single predictor variable x，y=w0+w1xwhere w0(y-intercept)and w1(slo

43、pe)are regression coefficients nMethod of least squares:estimates the best-fitting straight linenMultiple linear regression:involves more than one predictor variableTraining data is of the form(X1,y1),(X2,y2),(X|D|,y|D|)Ex.For 2-D data,we may have:y=w0+w1 x1+w2 x2Solvable by extension of least squar

44、e method or using SAS,S-PlusMany nonlinear functions can be transformed into the abovenSome nonlinear models can be modeled by a polynomial functionnA polynomial regression model can be transformed into linear regression model.For example,y=w0+w1 x+w2 x2+w3 x3convertible to linear with new variables

45、:x2=x2,x3=x3y=w0+w1 x+w2 x2+w3 x3 nOther functions,such as power function,can also be transformed to linear modelnSome models are intractable nonlinear(e.g.,sum of exponential terms)possible to obtain least square estimates through extensive calculation on more complex formulaeNonlinear Regression n

46、Generalized linear model:Foundation on which linear regression can be applied to modeling categorical response variablesVariance of y is a function of the mean value of y,not a constantLogistic regression:models the prob.of some event occurring as a linear function of a set of predictor variablesPoi

47、sson regression:models the data that exhibit a Poisson distributionnLog-linear models:(for categorical data)Approximate discrete multidimensional prob.distributions Also useful for data compression and smoothingnRegression trees and model treesTrees to predict continuous values rather than class lab

48、elsOther Regression-Based ModelsRegression Trees and Model TreesnRegression tree:proposed in CART system(Breiman et al.1984)CART:Classification And Regression TreesEach leaf stores a continuous-valued predictionIt is the average value of the predicted attribute for the training tuples that reach the

49、 leafnModel tree:proposed by Quinlan(1992)Each leaf holds a regression modela multivariate linear equation for the predicted attributeA more general case than regression treenRegression and model trees tend to be more accurate than linear regression when the data are not represented well by a simple

50、 linear modelnPredictive modeling:Predict data values or construct generalized linear models based on the database datanOne can only predict value ranges or category distributionsnMethod outline:Minimal generalization Attribute relevance analysis Generalized linear model construction PredictionnDete

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

20 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 数据分类预测

淘文阁 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：第4课数据分类和预测.ppt
链接地址：https://www.taowenge.com/p-60617223.html