北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU510978.pdf





《北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU510978.pdf》由会员分享,可在线阅读,更多相关《北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU510978.pdf(9页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、 Class 5:ANOVA(Analysis of Variance)and F-tests I.What is ANOVA What is ANOVA?ANOVA is the short name for the Analysis of Variance.The essence of ANOVA is to decompose the total variance of the dependent variable into two additive components,one for the structural part,and the other for the stochast
2、ic part,of a regression.Today we are going to examine the easiest case.II.ANOVA:An Introduction Let the model be Xy.Assuming xi is a column vector(of length p)of independent variable values for the ith observation,iiixy.Then b xi is the predicted value.sum of squares total:2YySSTi 2xb xyYbiii Y-b x
3、b xy 2Yb xb xy22iiiiii 22Yb xeii because 0Yb xeYb xb xyiiiii.This is always true by OLS.=SSE+SSR Important:the total variance of the dependent variable is decomposed into two additive parts:SSE,which is due to errors,and SSR,which is due to regression.Geometric interpretation:blackboard Decompositio
4、n of Variance If we treat X as a random variable,we can decompose total variance to the between-group portion and the within-group portion in any population:Class 5,Page 2 iiixyVVV Prove:iiixyVV iiiixx,Cov2VV iixVV (by the assumption that 0,Covkx,for all possible k.)The ANOVA table is to estimate th
5、e three quantities of equation(1)from the sample.As the sample size gets larger and larger,the ANOVA table will approach the equation closer and closer.In a sample,decomposition of estimated variance is not strictly true.We thus need to separately decompose sums of squares and degrees of freedom.Is
6、ANOVA a misnomer?III.ANOVA in Matrix I will try to give a simplied representation of ANOVA as follows:2YySSTi iiyY2Yy22 iiyY2Yy22 222Yn2Ynyi (because Ynyi)22Ynyi 2Yny y yJ yn/1y y (in your textbook,monster look)SSE=ee 2Yb xSSRi Yb x2Yb x22ii b x Y2Ynb x22ii Class 5,Page 3 iiiey Y2Ynb x22 222Yn2Ynb x
7、i (because 0e,Ynyii,as always)22Ynb xi 2YnXbXb yJ yn/1yXb (in your textbook,monster look)IV.ANOVA Table SOURCE SS DF MS F with Regression SSR DF(R)MSR MSR/MSE DF(R)Error SSE DF(E)MSE DF(E)Total SST DF(T)Let us use a real example.Assume that we have a regression estimated to be y=-1.70+0.840 x ANOVA
8、Table SOURCE SS DF MS F with Regression 6.44 1 6.44 6.44/0.19=33.89 1,18 Error 3.40 18 0.19 Total 9.84 19 We know 100 xi,50yi,12.509x2i,84.134y2i,66.257yxii.If we know that DF for SST=19,what is n?n=20 5.220/50Y 84.95.25.22084.134YnySST22i 0.1250.84x1.7-SSR2i 0.125x84.07.12x84.084.07.17.12ii =201.71
9、.7+0.840.84509.12-21.70.84100-125.0 Class 5,Page 4 =6.44 SSE=SST-SSR=9.84-6.44=3.40 DF(Degrees of freedom):demonstration.Note:discounting the intercept when calculating SST.MS=SS/DF p=0.000 ask students.What does the p-value say?V.F-Tests F-tests are more general than t-tests,t-tests can be seen as
10、a special case of F-tests.If you have difficulty with F-tests,please ask your GSIs to review F-tests in the lab.F-tests takes the form of a fraction of two MSs.MSR/MSEF,df2df1 An F statistic has two degrees of freedom associated with it:the degree of freedom in the numerator,and the degree of freedo
11、m in the denominator.An F statistic is usually larger than 1.The interpretation of an F statistics is that whether the explained variance by the alternative hypothesis is due to chance.In other words,the null hypothesis is that the explained variance is due to chance,or all the coefficients are zero
12、.The larger an F-statistic,the more likely that the null hypothesis is not true.There is a table in the back of your book from which you can find exact probability values.In our example,the F is 34,which is highly significant.VI.R2 R2=SSR/SST The proportion of variance explained by the model.In our
13、example,R-sq=65.4%VII.What happens if we increase more independent variables.1.SST stays the same.2.SSR always increases.3.SSE always decreases.4.R2 always increases.5.MSR usually increases.6.MSE usually decreases.Class 5,Page 5 7.F-test usually increases.Exceptions to 5 and 7:irrelevant variables m
14、ay not explain the variance but take up degrees of freedom.We really need to look at the results.VIII.Important:General Ways of Hypothesis Testing with F-Statistics.All tests in linear regression can be performed with F-test statistics.The trick is to run nested models.Two models are nested if the i
15、ndependent variables in one model are a subset or linear combinations of a subset(子集)of the independent variables in the other model.That is to say.If model A has independent variables(1,1x,2x),and model B has independent variables(1,1x,2x,3x),A and B are nested.A is called the restricted model;B is
16、 called less restricted or unrestricted model.We call A restricted because A implies that 03.This is a restriction.Another example:C has independent variable(1,1x,2x+3x),D has(1,2x+3x).C and A are not nested.C and B are nested.One restriction in C:32.C and D are nested.One restriction in D:01.D and
17、A are not nested.D and B are nested:two restriction in D:32;01.We can always test hypotheses implied in the restricted models.Steps:run two regression for each hypothesis,one for the restricted model and one for the unrestricted model.The SST should be the same across the two models.What is differen
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 回归分析 北大 暑期 课程 回归 分析 Linear Regression Analysis 讲义 PKU510978

链接地址:https://www.taowenge.com/p-83699770.html
限制150内