欢迎来到淘文阁 - 分享文档赚钱的网站! | 帮助中心 好文档才是您的得力助手!
淘文阁 - 分享文档赚钱的网站
全部分类
  • 研究报告>
  • 管理文献>
  • 标准材料>
  • 技术资料>
  • 教育专区>
  • 应用文书>
  • 生活休闲>
  • 考试试题>
  • pptx模板>
  • 工商注册>
  • 期刊短文>
  • 图片设计>
  • ImageVerifierCode 换一换

    第4课数据分类和预测.ppt

    • 资源ID:60617223       资源大小:509KB        全文页数:44页
    • 资源格式: PPT        下载积分:20金币
    快捷下载 游客一键下载
    会员登录下载
    微信登录下载
    三方登录下载: 微信开放平台登录   QQ登录  
    二维码
    微信扫一扫登录
    下载资源需要20金币
    邮箱/手机:
    温馨提示:
    快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如填写123,账号就是123,密码也是123。
    支付方式: 支付宝    微信支付   
    验证码:   换一换

     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    第4课数据分类和预测.ppt

    第4课数据分类和预测 Still waters run deep.流静水深流静水深,人静心深人静心深 Where there is life,there is hope。有生命必有希望。有生命必有希望内容提纲nWhat is classification?What is prediction?nIssues regarding classification and predictionnClassification by decision tree inductionnBayesian ClassificationnPredictionnSummarynReferencenClassification predicts categorical class labels(discrete or nominal)classifies data(constructs a model)based on the training set and the values(class labels)in a classifying attribute and uses it in classifying new datanPrediction models continuous-valued functions,i.e.,predicts unknown or missing values nTypical applicationsCredit approvalTarget marketingMedical diagnosisFraud detectionI.Classification vs.PredictionClassificationA Two-Step Process nModel construction:describing a set of predetermined classesEach tuple/sample is assumed to belong to a predefined class,as determined by the class label attributeThe set of tuples used for model construction is training setThe model is represented as classification rules,decision trees,or mathematical formulaenModel usage:for classifying future or unknown objectsEstimate accuracy of the modelnThe known label of test sample is compared with the classified result from the modelnAccuracy rate is the percentage of test set samples that are correctly classified by the modelnTest set is independent of training set,otherwise over-fitting will occurIf the accuracy is acceptable,use the model to classify data tuples whose class labels are not knownClassification Process(1):Model ConstructionTrainingDataClassificationAlgorithmsIF rank=professorOR years 6THEN tenured=yes Classifier(Model)Classification Process(2):Use the Model in PredictionClassifierTestingDataUnseen Data(Jeff,Professor,4)Tenured?Supervised vs.Unsupervised LearningnSupervised learning(classification)Supervision:The training data(observations,measurements,etc.)are accompanied by labels indicating the class of the observationsNew data is classified based on the training setnUnsupervised learning(clustering)The class labels of training data is unknownGiven a set of measurements,observations,etc.with the aim of establishing the existence of classes or clusters in the dataII.Issues Regarding Classification and Prediction(1):Data PreparationnData cleaningPreprocess data in order to reduce noise and handle missing valuesnRelevance analysis(feature selection)Remove the irrelevant or redundant attributesnData transformationGeneralize and/or normalize dataIssues regarding classification and prediction(2):Evaluating classification methodsnAccuracy:classifier accuracy and predictor accuracynSpeed and scalabilitytime to construct the model(training time)time to use the model(classification/prediction time)nRobustnesshandling noise and missing valuesnScalabilityefficiency in disk-resident databases nInterpretabilityunderstanding and insight provided by the modelnOther measures,e.g.,goodness of rules,such as decision tree size or compactness of classification rulesIII.Decision Tree Induction:Training DatasetThis follows an example of Quinlans ID3(Playing Tennis)Output:A Decision Tree for“buys_computer”age?overcaststudent?credit rating?noyesfairexcellent40nonoyesyesyes30.40Algorithm for Decision Tree InductionnBasic algorithm(a greedy algorithm)Tree is constructed in a top-down recursive divide-and-conquer mannerAt start,all the training examples are at the rootAttributes are categorical(if continuous-valued,they are discretized in advance)Examples are partitioned recursively based on selected attributesTest attributes are selected on the basis of a heuristic or statistical measure(e.g.,information gain)nConditions for stopping partitioningAll samples for a given node belong to the same classThere are no remaining attributes for further partitioning majority voting is employed for classifying the leafThere are no samples leftAttribute Selection Measure:Information Gain(ID3/C4.5)nSelect the attribute with the highest information gainnS contains si tuples of class Ci for i=1,m ninformation measures info required to classify any arbitrary tuplenentropy of attribute A with values a1,a2,avninformation gained by branching on attribute AAttribute Selection by Information Gain ComputationgClass P:buys_computer=“yes”gClass N:buys_computer=“no”gI(p,n)=I(9,5)=0.940gCompute the entropy for age:means“age split-pointExtracting Classification Rules from TreesnRepresent the knowledge in the form of IF-THEN rulesnOne rule is created for each path from the root to a leafnEach attribute-value pair along a path forms a conjunctionnThe leaf node holds the class predictionnRules are easier for humans to understandnExampleIF age=“=30”AND student=“no”THEN buys_computer=“no”IF age=“40”AND credit_rating=“excellent”THEN buys_computer=“yes”IF age=“=30”AND credit_rating=“fair”THEN buys_computer=“no”Avoid Overfitting in ClassificationnOverfitting:An induced tree may overfit the training data Too many branches,some may reflect anomalies due to noise or outliersPoor accuracy for unseen samplesnTwo approaches to avoid overfitting Prepruning:Halt tree construction earlydo not split a node if this would result in the goodness measure falling below a thresholdnDifficult to choose an appropriate thresholdPostpruning:Remove branches from a“fully grown”treeget a sequence of progressively pruned treesnUse a set of data different from the training data to decide which is the“best pruned tree”Approaches to Determine the Final Tree SizenSeparate training(2/3)and testing(1/3)setsnUse cross validationnUse all the data for trainingbut apply a statistical test(e.g.,chi-square)to estimate whether expanding or pruning a node may improve the entire distributionnEnhancements to Basic Decision Tree InductionnAllow for continuous-valued attributesDynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervalsnHandle missing attribute valuesAssign the most common value of the attributeAssign probability to each of the possible valuesnAttribute constructionCreate new attributes based on existing ones that are sparsely representedThis reduces fragmentation,repetition,and replicationClassification in Large DatabasesnClassificationa classical problem extensively studied by statisticians and machine learning researchersnScalability:Classifying data sets with millions of examples and hundreds of attributes with reasonable speednWhy decision tree induction in data mining?relatively faster learning speed(than other classification methods)convertible to simple and easy to understand classification rulescan use SQL queries for accessing databasescomparable classification accuracy with other methodsScalable Decision Tree Induction MethodsnSLIQ(EDBT96 Mehta et al.)builds an index for each attribute and only class list and the current attribute list reside in memorynSPRINT(VLDB96 J.Shafer et al.)constructs an attribute list data structure nPUBLIC(VLDB98 Rastogi&Shim)integrates tree splitting and tree pruning:stop growing the tree earliernRainForest (VLDB98 Gehrke,Ramakrishnan&Ganti)separates the scalability aspects from the criteria that determine the quality of the treebuilds an AVC-list(attribute,value,class label)Presentation of Classification ResultsVisualization of a Decision Tree in SGI/MineSet 3.0Interactive Visual Mining by Perception-Based Classification(PBC)IV.Bayesian Classification:Why?nProbabilistic learning:Calculate explicit probabilities for hypothesis,among the most practical approaches to certain types of learning problemsnIncremental:Each training example can incrementally increase/decrease the probability that a hypothesis is correct.Prior knowledge can be combined with observed data.nProbabilistic prediction:Predict multiple hypotheses,weighted by their probabilitiesnStandard:Even when Bayesian methods are computationally intractable,they can provide a standard of optimal decision making against which other methods can be measuredBayesian Theorem:BasicsnLet X be a data sample whose class label is unknownnLet H be a hypothesis that X belongs to class C nFor classification problems,determine P(H|X):the probability that the hypothesis holds given the observed data sample XnP(H):prior probability of hypothesis H(i.e.the initial probability before we observe any data,reflects the background knowledge)nP(X):probability that sample data is observednP(X|H):probability of observing the sample X,given that the hypothesis holdsBayesian TheoremnGiven training data X,posteriori probability of a hypothesis H,P(H|X)follows the Bayes theoremnInformally,this can be written as posteriori=likelihood x prior/evidencenMAP(maximum posteriori)hypothesisnPractical difficulty:require initial knowledge of many probabilities,significant computational costNave Bayes Classifier nA simplified assumption:attributes are conditionally independent:nThe product of occurrence of say 2 elements x1 and x2,given the current class is C,is the product of the probabilities of each element taken separately,given the same class P(y1,y2,C)=P(y1,C)*P(y2,C)nNo dependence relation between attributes nGreatly reduces the computation cost,only count the class distribution.nOnce the probability P(X|Ci)is known,assign X to the class with maximum P(X|Ci)*P(Ci)Training datasetClass:C1:buys_computer=yesC2:buys_computer=noData sample X=(age=30,Income=medium,Student=yesCredit_rating=Fair)Nave Bayesian Classifier:An ExamplenCompute P(X|Ci)for each classP(age=“30”|buys_computer=“yes”)=2/9=0.222 P(age=“30”|buys_computer=“no”)=3/5=0.6 P(income=“medium”|buys_computer=“yes”)=4/9=0.444 P(income=“medium”|buys_computer=“no”)=2/5=0.4 P(student=“yes”|buys_computer=“yes)=6/9=0.667 P(student=“yes”|buys_computer=“no”)=1/5=0.2 P(credit_rating=“fair”|buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair”|buys_computer=“no”)=2/5=0.4 X=(age=30,income=medium,student=yes,credit_rating=fair)P(X|Ci):P(X|buys_computer=“yes”)=0.222 x 0.444 x 0.667 x 0.0.667=0.044 P(X|buys_computer=“no”)=0.6 x 0.4 x 0.2 x 0.4=0.019 P(X|Ci)*P(Ci):P(X|buys_computer=“yes”)*P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”)*P(buys_computer=“no”)=0.007 Therefore,X belongs to class“buys_computer=yes”Nave Bayesian Classifier:CommentsnAdvantages Easy to implement Good results obtained in most of the casesnDisadvantagesAssumption:class conditional independence,therefore loss of accuracyPractically,dependencies exist among variables E.g.,hospitals:patients:Profile:age,family history etc Symptoms:fever,cough etc.,Disease:lung cancer,diabetes etc Dependencies among these cannot be modeled by Nave Bayesian ClassifiernHow to deal with these dependencies?Bayesian Belief Networks Bayesian Belief NetworksnBayesian belief network allows a subset of the variables conditionally independentnA graphical model of causal relationshipsRepresents dependency among the variables Gives a specification of joint probability distribution XYZPqNodes:random variablesqLinks:dependencyqX,Y are the parents of Z,and Y is the parent of PqNo dependency between Z and PqHas no loops or cyclesBayesian Belief Network:An ExampleFamilyHistoryLungCancerPositiveXRaySmokerEmphysemaDyspneaLCLC(FH,S)(FH,S)(FH,S)(FH,S)0.80.20.50.50.70.30.10.9Bayesian Belief NetworksThe conditional probability table for the variable LungCancer:Shows the conditional probability for each possible combination of its parentsLearning Bayesian NetworksnSeveral casesGiven both the network structure and all variables observable:learn only the CPTsNetwork structure known,some hidden variables:method of gradient descent,analogous to neural network learningNetwork structure unknown,all variables observable:search through the model space to reconstruct graph topology Unknown structure,all hidden variables:no good algorithms known for this purposenD.Heckerman,Bayesian networks for data miningV.What Is Prediction?n(Numerical)prediction is similar to classificationconstruct a modeluse model to predict continuous or ordered value for a given inputnPrediction is different from classificationClassification refers to predict categorical class labelPrediction models continuous-valued functionsnMajor method for prediction:regressionmodel the relationship between one or more independent or predictor variables and a dependent or response variablenRegression analysisLinear and multiple regressionNon-linear regressionOther regression methods:generalized linear model,Poisson regression,log-linear models,regression treesLinear Regression nLinear regression:involves a response variable y and a single predictor variable x,y=w0+w1xwhere w0(y-intercept)and w1(slope)are regression coefficients nMethod of least squares:estimates the best-fitting straight linenMultiple linear regression:involves more than one predictor variableTraining data is of the form(X1,y1),(X2,y2),(X|D|,y|D|)Ex.For 2-D data,we may have:y=w0+w1 x1+w2 x2Solvable by extension of least square method or using SAS,S-PlusMany nonlinear functions can be transformed into the abovenSome nonlinear models can be modeled by a polynomial functionnA polynomial regression model can be transformed into linear regression model.For example,y=w0+w1 x+w2 x2+w3 x3convertible to linear with new variables:x2=x2,x3=x3y=w0+w1 x+w2 x2+w3 x3 nOther functions,such as power function,can also be transformed to linear modelnSome models are intractable nonlinear(e.g.,sum of exponential terms)possible to obtain least square estimates through extensive calculation on more complex formulaeNonlinear Regression nGeneralized linear model:Foundation on which linear regression can be applied to modeling categorical response variablesVariance of y is a function of the mean value of y,not a constantLogistic regression:models the prob.of some event occurring as a linear function of a set of predictor variablesPoisson regression:models the data that exhibit a Poisson distributionnLog-linear models:(for categorical data)Approximate discrete multidimensional prob.distributions Also useful for data compression and smoothingnRegression trees and model treesTrees to predict continuous values rather than class labelsOther Regression-Based ModelsRegression Trees and Model TreesnRegression tree:proposed in CART system(Breiman et al.1984)CART:Classification And Regression TreesEach leaf stores a continuous-valued predictionIt is the average value of the predicted attribute for the training tuples that reach the leafnModel tree:proposed by Quinlan(1992)Each leaf holds a regression modela multivariate linear equation for the predicted attributeA more general case than regression treenRegression and model trees tend to be more accurate than linear regression when the data are not represented well by a simple linear modelnPredictive modeling:Predict data values or construct generalized linear models based on the database datanOne can only predict value ranges or category distributionsnMethod outline:Minimal generalization Attribute relevance analysis Generalized linear model construction PredictionnDete

    注意事项

    本文(第4课数据分类和预测.ppt)为本站会员(豆****)主动上传,淘文阁 - 分享文档赚钱的网站仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知淘文阁 - 分享文档赚钱的网站(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    关于淘文阁 - 版权申诉 - 用户使用规则 - 积分规则 - 联系我们

    本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

    工信部备案号:黑ICP备15003705号 © 2020-2023 www.taowenge.com 淘文阁 

    收起
    展开