《应用多元统计分析》PPT课件.ppt
1.largesampleproblem大样本问题大样本问题2.least-squareestimation最小二乘估计最小二乘估计 3.levelofsignificance显著性水平显著性水平signifikns4.likelihoodfunction似然函数似然函数laiklihud5.likelihoodratio似然比似然比reiu6.likelihoodratiotest似然比检验似然比检验7.linearrelation线性关系线性关系lini8.lineartrend线性预测值线性预测值9.loading载荷载荷ludi10.logarithms对数对数l:grim11.lowerlimit下限下限New words12.MahalDistance马氏距离马氏距离13.matrix矩阵矩阵meitriks14.maximum最大值最大值mksimn15.mean均值均值16.meandifference均值差值均值差值17.meansquare均方均方18.meansumofsquare 均方和均方和19.measure度量度量me20.median中位数中位数21.midpoint中值中值mid,pint22.negativecorrelation负相关负相关negtiv23.nominalvariable名义变量名义变量nominalnominal24.nonlinearcorrelation非线性相关非线性相关nnlini25.nonlinearregression 非线性回归非线性回归26.nonparametricstatistics 非参数统计非参数统计nnprmetrik27.nonparametrictest非参数检验非参数检验28.normaldistribution正态分布正态分布n:mlWe have seen in the previous chapters how very simple graphical devices can help in understanding the structure and dependency of data.The graphical tools were based on either univariate(bivariate)data representations or on“slick”transformations of multivariate information perceivable by the human eye.Most of the tools are extremely useful in a modelling step,but unfortunately,do not give the full picture of the data set.3 Moving to Higher DimensionsUnivariate,ju:nivrit Adj.单变量的单变量的 One reason for this is that the graphical tools presented capture only certain dimensions of the data and do not necessarily concentrate on those dimensions or subparts of the data under analysis that carry the maximum structural information.In Part III of this book,powerful tools for reducing the dimension of a data set will be presented.In this chapter,as a starting point,simple and basic tools are used to describe dependency.They are constructed from elementary facts of probability theory and introductory statistics(for example,the covariance and correlation between two variables).3 Moving to Higher Dimensions The covariance is a measure of dependence.Covariance measures only linear dependence.Covariance is scale dependent.There are nonlinear dependencies that have zero covariance.Zero covariance does not imply independence.Independence implies zero covariance.Negative covariance corresponds to downward-sloping scatterplots.Positive covariance corresponds to upward-sloping scatterplots.The covariance of a variable with itself is its variance Cov(X,X)=XX=2X For small n,we should replace the factor 1/n in the computation of thecovariance by 1/n1.The correlation is a standardized measure of dependence The absolute value of the correlation is always less than one.Correlation measures only linear dependence.There are nonlinear dependencies that have zero correlation.Zero correlation does not imply independence.Independence implies zero correlation.Negative correlation corresponds to downward-sloping scatterplots.Positive correlation corresponds to upward-sloping scatterplots.Fishers Z-transform helps us in testing hypotheses on correlation.For small samples,Fishers Z-transform can be improved by the transformation The center of gravity of a data matrix is given by its mean vector x=n1 XT1n.The dispersion of the observations in a data matrix is given by the empirical covariance matrix S=n1XTHX.The empirical correlation matrix is given by R=D1/2 SD1/2.A linear transformation Y=XAT of a data matrix X has mean A and empirical covariance ASXAT.The Mahalanobis transformation is a linear transformation z i=S1/2(x i )which gives a standardized,uncorrelated data matrix Z.Simple ANOVA models an output Y as a function of one factor.The reduced model is the hypothesis of equal means.The full model is the alternative hypothesis of different means.The F-test is based on a comparison of the sum of squares under the full and the reduced models.The degrees of freedom are calculated as the number of observations minus the number of parameters.The F-test rejects the null hypothesis if the F-statistic is larger than the 95%quantile of the F d f (r)d f(f),d f(f)distribution.The F-test statistic for the slope of the linear regression model y i=+x i+i is the square of the t-test statistic.Covariance MatrixCorrelation Matrix4 Multivariate DistributionsThe preceding chapter showed that by using the two first moments of a multivariate distribution(the mean and the covariance matrix),a lot of information on the relationship between the variables can be made available.Only basic statistical theory was used to derive tests of independence or of linear relationships.In this chapter we give an introduction to the basic probability tools useful in statistical multivariate analysis.Means and covariances share many interesting and useful properties,but they represent only part of the information on a multivariate distribution.Section 4.1 presents the basic probability tools used to describe a multivariate random variable,including marginal and conditional distributions and the concept of independence.In Section 4.2,basic properties on means and covariances(marginal and conditional ones)are derived.4.1 Distribution and Density Function