BasicStatistics.ppt
Basic Statistics 基本统计,Mechanical 职员工在岗培训标准教材,Basic Statistics基本统计,basic descriptive statistics基本描述统计 types of data 数据的类型 Data summary 数据总结 Numerical 用数值描述 Central Tendency (Location)集中趋势 Variation (Dispersion)离散趋势 Shape 形状 Graphical Presentation图形描述 Dot plot 点图 Boxplot 盒子图 Histogram (and distribution plot)直方图 Normal Distribution正态分布 Some Other Graphical Plots另外的图 Time Series chart时间序列图 Scatter Plots散点图 Pareto柏拉图,Statistics An Overview 统计纵览,Charts图,Tables表,Graphical Presentations 图解表示法,Location位置,Dispersion离差,Shape形状,Numerical Measures 数量表示法,Descriptive Statistics 描述性统计,Point Estimate 点估计,Interval Estimate 区间估计,Parameter Estimation,Parametric Methods 参数方法,Nonparametric Methods 非参数方法,Hypothesis Testing,Inferential Statistics 推断统计,Statistics统计,Types of Outputs (Data) 输出(数据)的类型,Attribute/Discrete Data (Qualitative or Categorical) 属性/离散数据(定性或类别数据) Categories类别 Yes, No 是,否 Go, No go去,不去 Machine 1, Machine 2, Machine 3机器1,机器2,机器3 Pass/Fail 通过/失败 Maintenance equipment failures, fiber breakouts, number of clogs设备故障维修次数,光纤断裂数,障碍的数量 Variable Data (Quantitative or Numerical)变量数据(定量或用数值表示) Decimal subdivisions are meaningful 小数有意义 Dimension, chemical yield, cycle time 尺寸,化学反应率,周期时间,Discrete vs. Continuous Data 离散数据连续数据对比,FAIL 失败,PASS通过,Electrical Circuit 电流,TEMPERATURE,Thermometer温度计,CONTINUOUS (scale can be meaningfully divided into finer increments of precision) 连续数据:数据刻度可以有意义地 无限细分,DISCRETE (scale cannot be meaningfully divided into finer increments of precision) 离散数据:数据刻度不能有意义地无限细分,Caliper 卡尺,ERROR 错误,Nominal:unrelated categories which represent membership or non-membership. Ordinal:ordered categories with no information about distance between categories. Interval:ordered categories with equal distance between categories, but no absolute zero point. Ratio:ordered categories with equal distance between categories with an absolute zero point.,Discrete data (usually) Grouping / sorting Yes / no, pass / fail Arithmetic not possible Discrete data Ranking Seldom used Very little arithmetic possible Continuous data Most common scale Use arithmetic with caution Continuous data Proportional relationship Most forms of arithmetic apply,Categories Labels 1st, 2nd, 3rd Relative height Alphabetic order 1<2<3<4 Temperature scales Dial indicator Velocity = distance/time Ruler,Categories of scale,Description,Examples,Categories of Scales,Note : Nominal data with only 2 levels are called Binary data,名义数据:不相关的同类或不同类的数据 顺序:规则排练的数据,与类别数据之间的距离无关 区间:按相同的间距在同类数据, 没有绝对的零点. 比例:按相同的间距在同类数据,有绝对的零点,离散数据 分组/分类 是/否,通过/失败 不可能进行算术运算 离散数据 排名 不常用 不能进行算术运算 连续数据 常用的刻度 算术运算时要小心 连续数据 比例关系 常见于算术运算,类别 商标 第一,第二,第三 相对高度 字母顺序 1<2<3<4 温度刻度 千分尺 速度 = 距离/时间 直尺,数值的类别,描述,例子,数值的类别,名义数据只有两个水平时称为二元数据,The Advantage of Continuous Data连续数据的优点,Often an output is measured with discrete data通常输出用离散数据来衡量 This output is usually a function of variable inputs earlier in the process that need to be identified and controlled 输出一般是变量数据的函数,在流程中这些变量需要被识别和控制 To obtain the same level of understanding regarding a process 为了得到对一个过程同样的理解,离散型需要更多的数据,Warm-up Exercise 热身联系,Describe the data. 描述下列数据 What are the numbers that can adequately represent the data? 用什么可以充分描绘下列数据?,Data Description数据描述,Central Tendency (Location)集中趋势(位置) Mean均值 Median中位数 Mode众数 Variation (Dispersion)离散趋势 Range极差 Inter-Quartile Range四分位数 Variance方差 Standard Deviation标准差 Shape形状 Skewness 偏斜 Kurtosis峰度,Mean: Arithmetic average of a set of values 均值:算术平均值 Reflects the influence of all values反映所有数值的影响 Strongly Influenced by extreme values特别受极值的影响 Median: Reflects the 50% rank - the center number after a set of numbers has been sorted 中位数:反映了经过排列的50%或 中间位置的数值 Does not necessarily include all values in calculation不必考虑所有的数值 Is “robust” to extreme scores对极值比较稳健 Mode: Most frequently occurring value in a data set 众数:出现频率最多的那个数 Why would we mainly use the mean, instead of the median, in process improvement efforts?在流程改进中,为何用均值而不用中位数呢?,Measures of Central Tendency集中趋势的测量,Central Tendency Exercise集中趋势-练习,Calculate the Mean, Median and Mode of the following data 计算以下数据的的均值,中位数和众数 1,3,3,5,9 Mean 均值= Median 中位数= Mode 众数= Minitab : StatBasic StatisticsDescriptive Statistics 统计基本统计描述性统计,Central Tendency Exercise 集中趋势-练习,Minitab : StatBasic StatisticsDisplay Descriptive Statistics,Central Tendency - Exercise集中趋势-练习,Minitab : StatBasic StatisticsDisplay Descriptive Statistics Minitab:统计基本统计描述性统计,Range:极差 Numerical distance between the highest and the lowest values in a data set.最大值与最小值之间的数字距离 Inter-Quartile Range四分位数 Q1 =The first or lower quartile is a value that has approximately 25% of the observations below in value. 四分之一位数: 把数据从小到大排列后,25% 位置的那个数 Q3 =The third or upper quartile is a value that has approximately 75% of the observations below in value. 四分之三位数: 把数据从小到大排列后,75% 位置的那个数 Variance (s2 ):方差 The average squared deviation of each individual data point from the mean.每个独立数据点偏离均值的平方的平均值 Standard Deviation (s):标准差 The square root of the variance.方差的算术平方根 most commonly used measurement to quantify variability 最常用来描述数据变异,Measures of Variation变异的测量,Variation - Exercise变异-练习,Calculate the Range, Inter-Quartile Range, Variance and Standard Deviation of the following data 计算下面数据的极差,四分位数,方差,标准差,Range 极差= Inter-Quartile Range 四分位区间= Variance 方差= Standard Deviation 标准差=,Calculating Standard Deviation计算标准差,Sum of squares平方和,Variation Exercise变异-练习,Minitab : StatBasic StatisticsDisplay Descriptive Statistics Minitab:统计基本统计描述性统计,Variation - Exercise变异-练习,Minitab : StatBasic StatisticsDisplay Descriptive Statistics Minitab:统计基本统计描述性统计,A Critical Statistical Rule!一条重要的统计规则,Variances add, standard deviations do not! 方差增加,标准差不会 Instead, add variances of the independent inputs to calculate the total variance in the output增加输入变量来计算输出的总变量,Measures of Shape形状的测量-偏斜,Measures of Shape形状的测量-峰度,Kurtosis = -ve,Kurtosis = 0,Kurtosis = +ve,Measures of Shape形状的测量,Minitab : StatBasic StatisticsGraphical Summary Minitab : 统计基本统计图形总揽,Measures of Shape形状的测量,Minitab : StatBasic StatisticsGraphical Summary Minitab : 统计基本统计图形总揽,Exercise 1 练习一,Data is collected on the level of orders for a particular product. The data is collected for 12 weeks and the supplier is open Monday to Saturday each week. The number of customers ordering from the store is also recorded. 下列数据来自一个特殊产品的12个星期(从周一到周六)的定单数量,同时也记录了客户的数量. Summary the data (Overall) 数据总的概要 Summarize the data by Day 按天来概述数据 Draw conclusion得出结论,Worksheet: sales.mtw,Exercise 1练习一,Comparison between days?这些天的对比? Conclusion?结论,Graphical Presentation for Data summary图形表示数据概要,Dot Plot 点图 Box Plot 盒子图 Histogram 直方图,Now select Graph Dot Plot 选择图:图点图,Dot Plots点图,worksheet: sales.mtw,Dot Plots点图,What are the information available in the Plots?图中有哪些用的信息? Central Tendency 集中趋势 Variation 变异(离散趋势) Shape形状,Box Plot Analysis盒子图分析,+,*,Outlier异常点,75th Percentile (Third Quartile, or Q3) 四分之三位数,Distribution Minimum分布最小值 (=Max lowest data point, Q1 - 1.5 (Q3-Q1) ) 最小的数据, Q1 - 1.5 (Q3-Q1) 中的最大值,Distribution Maximum分布最大值 (=Min highest data point, Q3 + 1.5 (Q3-Q1) ) 最大的数据, Q3 + 1.5 (Q3-Q1) 中最小值,25th Percentile (First Quartile, or Q1) 四分之一位数,Median (50th Percentile) 中位数(中间的那个数),Mean均值,GraphBoxplots 图盒子图,Box Plots盒子图,Box Plots盒子图,What are the information available in the Plots?图中有哪些可用的信息? Central Tendency 集中趋势 Variation 变异(离散趋势) Shape 形状,GraphHistogram 图直方图,Histogram直方图,GraphHistogram图直方图,Histogram直方图,GraphHistogram图直方图,Histogram直方图,What are the information available in the Plots?图中有哪些可用的信息? Central Tendency 集中趋势 Variation 变异(离散趋势) Shape 形状,Dot plot, Boxplot, Histogram点图,盒子图,直方图,Minitab : StatBasic StatisticsDisplay Descriptive Statistics Minitab : 统计基本统计描述性统计,Dot plot, Boxplot, Histogram点图,盒子图,直方图,Is there a difference from day to day?每天来看,有差异吗? Is your answer different when you doing the comparison numerically? 你们的答案有差异吗?,Dot plot, Boxplot, Histogram点图,盒子图,直方图,Of these 3 graphs, Which is better? 三张图中,哪一张比较好?,The Normal Distribution正态分布,The “Normal” Distribution is a distribution of data which has certain consistent properties 正态分布有一些共同的性质. These properties are very useful in our understanding of the characteristics of the underlying process from which the data were obtained.这些性质有利于我们对流程的一些特性的深入理解. Many natural phenomena and man-made processes are distributed normally, or can be represented as normally distributed许多自然现象或人造流程呈现为正态分步或可用正态分布来表达.,Property 1: A normal distribution can be described completely by knowing only the: 性质一:正态分布可以仅用均值和标准差来描绘. mean, and standard deviation,The Normal Distribution 正态分布,Distribution One分布一,Distribution Two分布二,Distribution Three分布三,What is the difference among these three normal distributions? 三个正态分布有什么不同?,Note : A normal distribution has both Skewness and Kurtosis = 0 正态分布的偏斜峰度均为0.,The Normal Curve and its Probabilities正态曲线和它的性质,Property 2: The area under sections of the curve can be used to estimate the cumulative probability of a certain “event” occurring 性质二:曲线下的面积可以用来估计事件发生的累积概率.,Cumulative probability of obtaining a value between two values 两个数值之间的累积概率,Normal Probability Plots正态概率图,We can test whether a given data set can be described as “normal” with a test called a Normal Probability Plot 我们可以测试一组给定的数据是否是正态分布,称之为正态概率图. If a distribution is close to normal, the normal probability plot will be a straight line.如果分布近似于正态,那么正态概率图将是一条直线. Minitab makes the normal probability plot easy. Minitab将正态性测试变得容易. Open Distributions.Mtw 打开Distributions.Mtw Choose: Graph Probability Plot 选择图 正态概率图 (or) Stat Basic Statistics Normality Test 或统计 基本统计 正态性测试 Produce a normal plot of each of the first 3 columns. Which appear to be normal? 做前三列数据的正态图,哪一个比较象正态分布? Now, graph a histogram of each.现在,做每一列数据的直方图. What does this reveal?这揭示了什么?,Normal Probability Plots正态概率图,Mystery Distribution神秘分布,Generate a Normal Probability Plot for the “dist4” variable in C4. 对C4变量做一个一个正态图. What is your conclusion? Is this a normal distribution? 你的结论?它是一个正态分布吗?,StatBasic StatisticsGraphical Summary Variables = Normal,Normal Distribution Summary正态分布概要,General Guidelines : 总的指南 We can assume that the data is normally distributed if ALL the following criteria is fulfilled 如果能满足以下条件我们认为数据是正态分布的. P-value 0.05 P值 0.05 |Skewness| < 1 偏斜< 1 |Kurtosis| < 1 峰度< 1,Time Series Plot (Run Chart)时间序列图(运行图) Determine if process is stable (over time) 判断流程随时间推移是否稳定 If process is not stable, identify and remove causes (Xs) of instability (obvious non-random variation) 如果流程不稳定,识别并去掉引起不稳定的因素 Scatter Plot散点图 Determine the relationship between variables 判断两个连续数据之间的关系 Pareto chart柏拉图 Identify the Critical Few vs trivial many 识别关键的少数和琐碎的多数,Some Other Graphical Plots另外的一些有用的图形,GraphTime Series Plot 图时间序列图 This produces a simple run chart of the data.这是一张简单的运行图. You can also create the index information in various formats. 你可以建立按不同的时间指示时间序列图 Double click on the x-axis, 双击X轴 Edit ScaleTimeCalendar select Day Month,Time Series Plots时间序列图,Exercise练习,2,0,1,0,0,1,1,0,1,0,0,9,0,S,a,m,p,l,e,N,u,m,b,e,r,S,a,m,p,l,e,M,e,a,n,X,-,b,a,r,C,h,a,r,t,f,o,r Profile B,1,X,=,1,0,1,.,0,1,0,8,.,5,9,3,.,4,2,2,0,1,0,0,1,2,0,1,1,5,1,1,0,S,a,m,p,l,e,N,u,m,b,e,r,S,a,m,p,l,e,M,e,a,n,X,-,b,a,r,C,h,a,r,t,f,o,r Profile C,X,=,1,1,5,.,0,1,1,9,.,7,1,1,0,.,4,Assume machines A, B, and C make identical products (w/ range charts in control) 假定机器A,B,C做同样的产品(极差在控制之中) Assume that the target value for each product output variable is 100 mm 假定产品的输出变量的目标值为100 mm Answer the following questions:回答下列问题: Which machine(s) exhibit(s) variation? 哪一台机器存在差异? Where is each machine centered? ?每一台机器的集中趋势在哪里? Which machines are predictable over time? 随时间变化,哪一台机器的输出是可预测的? Which machines have special cause variation? 哪一台机器有特殊原因引起的变异? Which machine would you want making your product? 你想用哪台机器制造你的产品? Which machine would probably be easiest to fix? 哪一台机器最容易固定? Why?为何?,The scatter plot looks at the relationship between two continuous variables.散点图表示两个连续数据之间的关系 GraphScatterplot 图散点图 Enter sales for Y and customers for X 选择销售量(Y)对客户(X) Stat Regression Fitted Line Plot 统计回归最适合的线图 This put the best fit line through the data.通过数据最适合的直线. More about this in the correlation and regression module in week 2 更多的相关与回归的内容在第二星期的课程中.,Scatter Plots散点图,DefectsFreqsWeek Air Bubble931 Air Bubble812 Air Bubble623 Air Bubble574 Weight Dev.1201 Weight Dev.1322 Weight Dev.913 Weight Dev.884 Deformation181,Data is collected on the frequency of different types of defects each week. 数据按每周的缺陷类型的频率收集. This data is included in the数据见:,Pareto Charts柏拉图,Worksheet: pareto64.mtw,StatQuality ToolsPareto Chart Choose Chart defects table:选择图中的缺陷表,Pareto Charts柏拉图,Summary总结,Understand basic descriptive statistics理解基本的描述性统计 Understand the different types of data.理解不同的数据类型 Data summary 数据概要 Numerical 用数值描述 Central Tendency (Location)集中趋势 Variation (Dispersion)离散趋势 Shape 形状 Graphical Presentation图形描述 Dot plot 点图 Boxplot 盒子图 Histogram (and distribution plot)直方图 Normal Distribution正态分布 Some Other Graphical Plots另外的图 Time Series chart时间序列图 Scatter Plots散点图 Pareto柏拉图,