统计基础及质量统计.pptx
6s s1單元(4)-A統計基礎及品質統計資料數據基礎統計學生產製造環境品質統計圖表製程能力分析SPC統計製程控制6s s2資料及數據6s s3你想瞭解什麽?資訊源資訊源:分組離散型名義型 順序型 間距型“資料本身並不能提供資訊資料本身並不能提供資訊 必須對資料加以處理以後必須對資料加以處理以後才能得到資訊才能得到資訊, 而處理資料的工具就是統計學而處理資料的工具就是統計學”. 衡量連續型比率型 文字的 (A to Z) 圖示的 口頭的 數位的 (0-9)數據6s s4FAILPASS計時器計時器 NO-GOGO數量數量 單價單價 說明說明 總價總價1$10.00$10.003$1.50$4.5010$10.00$10.002$5.00$10.00裝貨單裝貨單 離散型資料和連續型資料電氣電路電氣電路溫度溫度溫度計溫度計連續型連續型離散型離散型卡尺卡尺錯誤錯誤6s s5$連續資料的優勢 連續的連續的離散的離散的信息量少信息量多6s s6 離散型資料 (通常) 分組 / 分類 是 /否, 合格 / 不合格 不能計算 離散型資料 分級 很少用 很難加以計算 連續型資料 最常見的尺規 計算時要很小心 連續型資料 比例關係 可應用演算法的多數公式 分類 標簽 第一、第二、第三 相對高度 字母順序 123 0 1 = 0 1 0 2 = 0 2 Basic Statistics Display Descriptive StatisticsGraphs Graphical SummaryA2 27.11描述性統計圖形分析總結圖形分析總結變數:神秘中值的95%信賴區間 的95%信賴區間 Anderson-Darling常態測試P值 0.00均值 100.00標準偏差 32.38變異數 1048.78偏度 0.01峰度 -1.63資料量 500.00最小值 41.77第一象限 68.69中值 104.20第三象限 130.81最大值 162.82的95%信賴區間97.5 102.85s的95%信賴區間30.49 34.53中值的95%信賴區間 82.78 117.666s s34資料收集時的重點資料收集時的重點How the data are collected affects the statistical appropriateness and analysis of a data set(資料如何收集可影響統計的適切性). Conclusions from properly collected data can be applied more generally to the process and output. Inappropriately collected data CANNOT be used to draw valid conclusions about a process. Some aspects of proper data collection that must be accounted for are:The manufacturing environment(製程環境製程環境)from which the data are collected. When products are manufactured in batches or lots, the data must be collected from several batches or lots.Randomization(隨機隨機). When the data collection is not randomized, statistical analysis may lead to faulty conclusions.6s s35Continuous Manufacturing (連續)occurs when an operation is performed on one unit of product at a time. An assembly line is typical of a continuous manufacturing environment, where each unit of product is worked on individually and a continuous stream of finished products roll off the line. The automotive industry is one example of Continuous Manufacturing. Other examples of continuously manufactured product are:television sets,fast food hamburgers,computers.Lot/Batch Manufacturing (批次) occurs occurs when operations are performed on products in batches, groups, or lots. The final product comes off the line in lots, instead of a stream of individual parts. Product within the same lot are processed together, and receive the same treatment while in-process. Lot/Batch Manufacturing is typical of the semiconductor industry and many of its suppliers. Other examples of lot/batch manufactured product include:chemicals,semiconductor packages,cookies.Manufacturing Environment製造環境製造環境6s s36In Continuous Manufacturing the most important variation is between partsIn Lot/Batch Manufacturing, the variation can occur between the parts in a lot and between the lots:Product within the same lot is manufactured together.Product from different lots are manufactured separately. Because of this, each lot has a different distribution. This is important because Continuous Manufacturing is a basic assumption for many of the standard statistical methods found in most textbooks or QC handbooks. These methods are not appropriate for Lot/Batch Manufacturing. Different statistical methods need to be used to take into account the several sources of variation in Lot/Batch Manufacturing.要注意要注意: 連續和批量生產所用的統計方法有些不同連續和批量生產所用的統計方法有些不同6s s37With Lot/Batch Manufacturing, each lot has a different mean. Due to random processing fluctuations, these lots will vary even though the process may be stable. This results in several “levels” of distributions, each level with its own variance and mean: A distribution of units of product within the same lot. A distribution of the means of different lots. The total distribution of all units of product across all lots.LotX12345* *Distribution ofIndividual LotDistribution ofLot MeansOverall Distributionof Combined LotsVariation WithinEach LotVariation Between LotsTotal Variation6s s38The different variances of a Lot/Batch Manufacturing process form a hierarchy called nesting. Data collected from such processes usually have what is called a nested data structure. 1121 2 3 4 51 2 3 4 5LOTS班班2121 2 3 4 51 2 3 4 5Each of the levels in the nested structure corresponds to a single variance. With a nested data set from this process, we need to take each source of variation into account when collecting data to ensure the total process variation is represented in our data set:222Lot2Total線班ssss生產線6s s3922 22222X12X2212121 , , ;X;X ; XXXX+=+=總 總 總 6原則 變異數可相加, 標準差則不能相加輸入變數變異數相加計算輸出中的總變異數 所以那麽引起的變異數輸入變數引起的變異數輸入變數過程輸出的變異數如果6s s40123456Lots sWithin is smalls sLot is largeprocess has small within-lot variation and large lot-to-lot variation (which is very common), data values from the same lot will be highly correlated, while data from different lots will be independent: 6s s41品質統計圖表直方圖(Histograms)方框圖(Boxplots)柏拉圖(Pareto Diagrams)散佈圖(Scatterplots)趨勢圖(Trend Charts)6s s42直方圖(Histograms)Histograms provide a visual description of the distribution of a set of data. A histogram should be used in conjunction with summary statistics such as and s. A histogram can be used to: Display the distribution of the data(現示數據的分佈). Provide a graphical indication of the center, spread, and shape of the data distribution (較定性地顯示數據的均值,散佈及形狀). Clarify any numerical summary statistics (which sometimes obscure information). (顯示較模糊的統計結果).Look for outliers - data points that do not fit the distribution of the rest of the data. (顯示異常點)x6s s43 : : : : . . . : . . . . . : . . : : :.: : . : : : :.: : . : . : . .:.:.:.:.:.: : . . : . .:.:.:.:.:.: : . -+-+-+-+-+- -+-+-+-+-+-加侖加侖/ /分分鐘鐘 49.00 49.50 50.00 50.50 51.00 49.00 49.50 50.00 50.50 51.00點圖分佈 設想有一個泵流量爲50加侖/分鐘的計量泵。按照節拍對泵的實際流量進行了100次獨立測量。畫出各個點,每點代表一個給定值的輸出“事件”。當點聚集起來時,泵的實際性能狀況可以看作泵流量的“分佈”。 6s s4451 .350.850.349.849.348.84030201 00直方圖分佈 還是這些資料,現在設想將其分組後歸入“區間”。泵流量點落入指定區間的次數決定區間條的高度。 頻率加侖/分鐘6s s45直方圖(Histograms)150.7149.7154.5149.6155.3149.0160.5149.0155.3149.3149.2153.5145.5161.0151.5154.3150.9152.4150.5152.3144.5151.6151.1151.0147.5150.6147.4150.8148.3146.8148.7147.6153.0139.0153.4146.5151.4143.5149.4150.4153.1150.7149.1150.6149.6152.5145.2150.5146.4151.3151.7145.6147.1152.6147.0148.5155.0148.4151.3148.8146.7152.7155.3146.6144.8150.9149.5151.4147.3154.9151.2148.6142.5151.6151.0152.9146.9145.3150.8150.3153.6154.6150.6148.6155.1145.4148.5157.0148.9145.0147.7151.1149.7154.4149.1151.5153.3149.5152.8150.81401451501551606s s46直方圖(Histograms).040.045.050.055.060.065.070.0750246810.000.025.050.075.100.125Multi-Modal Shape(雙峰雙峰): Skewed Shape(偏一邊偏一邊): Data can be right-skewed or left-skewed. This data is right-skewed the right tail is longer than the left tail. Outliers:特異點特異點6s s47練習6s s48方框圖(Boxplots)Boxplots are a graphical tool valuable for comparing the distributions of two or more groups (e.g., different lots, shifts, operators, etc.). Each distribution on this chart consists of the following: A “box” representing the middle 50% of the data values. The length of the “box” is called the “Interquartile Range” (IQR). Inside the “box” is a line representing the median (50th percentile) of the data. Two “tails” which extend out to the minimum and maximum data values (assuming there are no outliers in the data). If the distance between the a data point and the nearer quartile is greater than 1.5xIQR, the data point is labeled as an outlier, and the “tail” on that side of the boxplot is shortened to the outermost data value within 1.5xIQR from the quartile.6s s49方框圖(Boxplots)MedianMaximumData Value75thPercentile25thPercentileOutermostdata valueswithin 1.5xIQRof the 75th and25th Percentiles.OutlierNO OUTLIERSIQROUTLIERSMinimumData ValueOutlier1.5xIQR6s s50方框圖(Boxplots)EXAMPLE : Creating a Boxplot The figure below is a boxplot of the 100 plating thickness measurements. The histogram for the same data set is displayed for comparison.140 145 150 155 160 95% Confidence Interval for Mean“Shortest Half” - Densest region of data, (contains 50% of data). 6s s51方框圖(Boxplots)Lot 1Lot 2Lot 3Lot 4Lot 5Lot 6Lot 7149.18144.78146.77167.85144.51134.96152.41151.31147.18150.66164.17144.41134.7146.76150.8145.66145.11168.23146.68135.02148.19149.06147.09145.09162.88145.4134.63143.75151.73145.86145.98163.1143.3134.87153.71148.15144.64146.77166.91146.87135.34145.13152.55143.67149.9165.78148.61134.6148.54130135140145150155160165170Lot 1Lot 2Lot 3Lot 4Lot 5Lot 6Lot 7Lot No.Plating thickness measurements collected from 7 lots of product.6s s52方框圖(Boxplots).0 4 0.0 4 5.0 5 0.0 5 5.0 6 0.0 6 5.0 7 0.0 7 5024681 0.9 09 51 0 01 0 5Multi-Modal Shape: Skewed Shape: Outliers: 6s s53練習6s s54柏拉圖(Pareto Diagrams)While histograms are used to display the distribution of a set of continuous (measured) data, Pareto diagrams are used to display the distribution of discrete (counted) data, such as different types of defects. Pareto diagrams can also be used with continuous (measured) data, particularly in displaying variance components analysis results, as we will see later in this course. Pareto diagrams are a useful tool for determining which problems or types of problems are most severe or occur most frequently, hence should be given high priority for process improvement efforts. Pareto diagrams separate the significant vital few problems from the trivial many to help determine which problems to address first (and which to address later).重點中找重點!6s s55Pareto圖分析Pareto 圖圖根據 frequency 欄的內容判斷各個缺陷影響的大小,並按從大到小的次序排列。最後一組總是標有 “其他” ,並以默認方式包括所有缺陷的分類計算,這幾類缺陷非常少, 它們占總缺陷的 5% 以下。該圖右側 Y 軸表示占總缺陷的百分比,左側 Y 軸表示缺陷數。紅線 (在螢幕上可以看到) 表示累積百分比,而直方圖表示每類缺陷的頻率 (占總量的百分比) 。在圖的下方列出所有的值 百分比缺陷的Pareto圖 計數 缺陷缺陷 計數 274 59 43 19 10 18百分比 64.8 13.9 10.2 4.5 2.4 4.3累積百分比 64.8 78.7 88.9 93.4 93.4 100.0螺釘丟失 夹子丢失襯墊泄漏 外殼有缺陷 零件不完整 其他 400300200100 0100806040200百分比(%)柏拉圖(Pareto Diagrams)6s s56Pareto圖分析: 創建一個加權的 Pareto圖 通過指定金額/缺陷或用其他的加權方法,可以給次數加權。列在C1中的缺陷發生次數的價格列在 C3 (value) 中, 價格乘以次數等於這類缺陷的費用 (c4) 。繪製費用(cost)曲線圖,而不是繪製次數(count)圖, 這樣可以更好地說明每個缺陷對業務的影響。 缺陷的Pareto圖 缺陷缺陷計數 2320.71 1653.00 1230.00 800.00 349.87 155.52 百分比 35.7 25.4 18.9 12.3 5.4 2.4累積百分比 35.7 61.0 79.9 92.2 97.6 100.0螺钉丢失螺釘丟失襯墊泄漏外殼有缺陷零件不完整其他600050004000300020001000 0100806040200計數百分比(%)柏拉圖(Pareto Diagrams)6s s57層別Pareto圖: 解釋分組資料 上圖使用了一個 By Variable(從屬變數),從屬變數),所有的圖都在一頁上。 下圖使用同樣的命令,沒有從屬變數。 當選擇每頁一張圖時,所有的圖的計數(左軸)刻度相同。 右側的百分比只反映該圖占總體的百分比。這些圖表明, 70%的記錄缺陷是刮傷和剝落的 (下部),約有一半的缺陷是夜班人員記錄的 (上右圖)。此外,記錄缺陷是刮傷和剝落的比例,對白班和夜班的 來說似乎也差不多。然而,晚班和周末班出現的缺陷樣式是不同的。 裂紋Pareto圖 白班 晚班 夜班 周末班 刮傷剝落其他污點 151050151050151050151050裂紋Pareto圖403020100100806040200缺陷缺陷計數 15 13 6 6百分比 37.5 32.5 15.0 15.0 累積百分比 35.5 70.0 85.0 100.0刮伤拨落其他污点計數 計數計數計數計數百分比(%)柏拉圖(Pareto Diagrams)6s s58練習6s s59散佈圖(Scatterplots)Until now, all the graphical tools weve discussed have been for examining the distribution of a single process characteristic. The scatterplot is a graphical tool for examining the relationship between two process characteristics. A scatterplot is an X-Y plot of one variable versus another. Each unit of product usually has many characteristics, process input variables, etc. One objective might be to see whether two variables or characteristics are related to each other (i.e., to see what happens to one of the variables when the other variable changes). This relationship between two variables is called correlation. Scatterplots can help us answer this type of question.6s s60散佈圖(Scatterplots)Acid AgeEtch RateAcid AgeEtch RateAcid AgeEtch Rate4.0134.5134.0154.5181.5302.5233.0183.5191.0313.5195.575.044.0122.0253.5212.0241.0292.0261.0283.0205.593.0195.064.5145.095.592.5272.5251.5301.531051 01 52 02 53 03 50123456A c id A g e6s s61散佈圖(Scatterplots)In addition to telling us whether or not two variables are related, scatterplots can tell us how they are related, and the strength of the relationship:Strong Positive Correlation強正相關強正相關No Correlation無關無關Weak Negative Correlation弱負相關弱負相關Weak Positive Correlation弱正相關弱正相關Strong Negative Correlation強負相關強負相關6s s62散佈圖(Scatterplots)In addition, scatterplots are an excellent tool for determining the type of relationship between the two variables, as well as looking for outliers:Linear Relationship線性相關線性相關Outliers 特異特異Non-Linear Relationship非線性相關非線性相關6s s63散佈圖(Scatterplots)Correlation and CausationWe must always take care not to confuse correlation with causation. The fact that two characteristics are correlated does not prove that one causes the other. Both may be related to some other factor which is the true root cause.Number of TelevisionsNumber ofTrafficAccidents19701990But is there a cause-effect relationship between the two?Did the increase in TVs cause the number of accidents to go up? (Not likely.)Did the increase in traffic accidents cause people to buy more TVs? (Not likely, either.)6s s64練習6s s65趨勢圖(Trend Charts)Trend Charts Stability: A process is stable if its mean and standard deviation are constant and predictable over time. A disadvantage of histograms and normal probability plots is that they cannot be used to determine whether the process is stable over time. A plot of the data in time order will allow us to do that. These time-ordered plots, called Trend charts and Control charts are essential when examining the stability of a distribution over time. A trend chart or a control chart can detect instability if it exists. Control charts, which are a special kind of trend chart, are discussed in detail separately in a later course module.可看出穩定性及預測性可看出穩定性及預測性6s s66趨勢圖(Trend Charts)The table below contains average plating thickness measurements taken from 21 lots of product. Below that is a trend chart of the data.Lot #Plating ThicknessLot #Plating ThicknessLot #Plating Thickness1151.98143.815149.22147.49152.716147.53155.810147.417151.94151.711152.718141.95149.212143.819152.76153.813137.120147.47159.914142.521157.31351401451501551600510152025Lot No.6s s67練習6s s68 NoisyThe results of a statistical analysis can be seriously affected by the failure of the data to meet certain required assumptions. One of the most common assumptions is that the data values are independent and that they come from a Normal distribution. This assumption can be violated in several ways: Outliers (points that do not fit the rest of the distribution) in the data, Non-Normal-shaped distributions (multi-modal or skewed distributions), Data that exhibit these characteristics can be thought of as noisy data. The procedures in this section provide techniques for effective detection and analysis of noisy data.雜訊6s s69 NoisyBoxplotsTrend ChartHistogramScatterplotNormal Prob. Plot6s s70 NoisyRecommended strategy for handling outliers: 1.Identify the outliers using the methods described in the following pages. If possible, find the causes of the outliers. Remove the outliers with identified causes from the data set(找原因). 2.If all the outliers can be explained, then analyze the data as usual. 3.However, if there are any outliers that do not have explanations, analyze the data twice:including the outliers,excluding the outliers.See if and how the analysis results differ.6s s71製程能力分析6s s72當製程開始產生變異時,其統計分佈圖的形狀也開始變化。通常變化不外下面三種基本狀況的組合:整