SPSS的聚类分析实验报告.docx
验 报 告学 号专业班级课程名称统计分析SPSS软件指导教师实验室实验名称SPSS的聚类分析一、实验目的:掌握层次聚类分析和K-Means聚类分析的基本思想和具体,并能够对分析结果进行解释。二、实验题目:1.、现要对一个班同学的语文水平进行聚类,拟聚为三类,聚类依据是两次语文考试的成绩。数据如下表 所示。试用系统聚类法和K.均值法进行聚类分析。人名第一次语文成绩第二次语文成绩张三9998王五8889赵四7980小杨8978蓝天7578小白6065李之7987马武7576郭炎6056刘小100100三、实验步骤(最好有截图):L先打开常用软件里的SPSS 11. 5 for Windows, exe,在Variable View中根据题目输入相关数据,如下图 所示BUntitled - SPSS Data EditorFile Edit View Data Transform Analyze Graphs Utilities Window Help r.昌同。卜出国耻闻唯阖勖雕I %© ipJUNameTypeWidthDecimalsLabelValuesMissingColumnsAlignMeasure1人名String J80NoneNone8LeftNominal_2第一次语Numeric82NoneNone8RightScale_3第二孀Numeric82NoneNone8RightScale_42.在Data View中先输入数据,结果如下图所示国 Untitled - SPSS Data EditorFile Ed:l t Vi ew Data Transform Analyze Graphs Utilities Windovy Help以旧|昌|曳I口| L|j?阖 灯广| 工IC濠|寓10人名|第一次语|第二枚语varvarvar1张三99,0098.002王五88,0089.003赵四79,0080.004小杨89,0078.005蓝天75,0078.006小白60,0065.007李之79,0087.008马武75,0076.009郭炎60,0056.0010刘小1口口.口口 100.00 _J3 .首先试用系统聚类法对相关数据进行聚类4 .选择菜单:【Analyze】f Classify f Hierarchical Cluster,然后选择参与层次聚类分析的变 量两次语文考试的成绩到【Variable (s)】框中,再选择一个字符型变量“人名”作为标记变量至U【LabelCases by框中。5.按“Plots”后进行选择Hierarchical Cluster Analysis: Plots DendrogramIcicle 0All clusters Specified range of clustersStart:1 Stop:I By:|1NoneOrientation 0 VerticalHorizontal6.按“Statistics”后进行选择ContinueCancelHelpHierarchical Cluster Analysis: Statistics* Agglomeration schedule一 Proximity matrixCluster Membership0 NoneSingle solution:Range of solutions:clustersFromthrough IclustersContinueCancelHelp7.按“Method”后进行选择Cluster Method:Between-groups linkageContinueMeasure* Interval: Squared Euclidean distanceM q Root:Counts:Chi-square measurebinary:Squared Euclidean distance 二Present: 1 Absent: 0Transform ValuesStandardize: None二G By variableC By caseCancelHelpTransform MeasuresAbsolute valuesF Change sign_ Rescale to 0-1 range8.对第一个表格进行保存,并且命名为“语文水平.sav”,同时保存输出结果四、实验结果及分析(最好有截图):第一题:1.首先试用系统聚类法对相关数据进行聚类4 ClusterCase Processing Summary3也CasesValidMissingTotalNPercentNPercentNPercent10100.00,010100.0a, Squared Euclidean Distance usedb. Average Linkage (Between Groups)Average Linkage (Between Groups)Agglomeration ScheduleStageCluster CombinedCoefficientsStage Cluster First AppearsNext StageCluster 1Cluster 2Cluster 1Cluster 21584.00000321105.00000833526,00001746981,00000952785,000006624151,500507723174,778638812717,8332799161474.250840Vertical IcicleCase96853472郭小 q就起小至王张Number of clusters炎自武天四杨之五1XX X X X X XX X X X X XX XXXXX2XX XXXXX X X X X XX XXXXX3XX XXXXX X X X X XX XXXX4XX XXXXX XXXXX XXXX5XX XXXXX XXXX XXXX6XX XXXXX XXXXXXX7XXXXXX XXXXXXX8XXXXXXXXXXXX9XXXXXXXXXXXDendrogramhierarchicalCLUSTERANALYSIS*Dendrogram using Average Linkage(BetweenGroups)RescaledDistanceCluster CombineC A Label101520Num25-+天武四五之杨三小白炎 蓝马赵王李小张刘小郭583214110692. K-均值法进行聚类分析后的输出结果4 Quick ClusterInitial Cluster CentersCluster123第T欠语100.0060.0079.00第二次语100.0056.0087.00Iteration History3IterationChange in Cluster Centers12311.1184.5005.9562,000,000,0003. Convergence achieved due to no or small change in cluster centers. The maximum absolute coordinate change for any center is ,000. The current iteration is 2, The minimum distance between initial centers is 24.698.Final Cluster CentersCluster123第T欠港99.5060.0080.83第二次语99.0060.5081.33ANOVAClusterErrorFSig.Mean SquaredfMean Squaredf第T欠港781,533227,619728,297,000第二次语744,133226,548728,030,000The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the differences among cases in different clusters. The observed significance levels are not corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means are equal.Number of Cases in each Cluster2.0002.0006.00010,000,000Cluster 123Valid Missing