校级精品课程汇报.ppt
22:29,1,Introduction to Logistic Regression,宇传华(yuchua163.com )武汉大学公共卫生学院流行病与卫生统计学系2011,5,31,22:29,2,LPM 线性概率模型Odds Ratio 优势比Nominal Variables 名义变量Dummy Variable 哑变量Multiple Logistic Regression 多重Logistic回归,New Words,22:29,3,1. Review the Type of Variables 2. Variables In Logistic Regression3. Why cannot we use a Linear Regression for Categorical Response?4. Logistic Regression Model 5. What Is an Odds Ratio?6. Multiple Logistic Regression,CONTENTS,22:29,4,1. Review the Type of Variables,22:29,5,Choosing the Scale of Measurement,Before analyzing, select the measurement scale for each variable.,22:29,6,分类(定性)变量,数值(定量)变量,名义变量,有序变量,离散变量,连续变量,22:29,7,Nominal Variables,22:29,8,Ordinal Variables,22:29,9,Weather Good or Bad ?,Binomial Variables,Male or Female ?,22:29,10,Continuous Variables,22:29,11,2. Variables In Logistic Regression,22:29,12,Predicted ,Outcome ,Dependent variable,应变量,22:29,13,Types of Logistic Regression,1. 二项分类logistic回归,2. 多项分类logistic回归,3. 有序分类logistic回归,22:29,14,What Does Logistic Regression Do?,to predict the probability of specific outcomes.,Predictor variables Predicted variable Explanatory variables Response variableCovariables Outcome variableIndependent variables Dependent variable,二分类应变量,自变量,22:29,15,Independent variables of Logistic Regression,Continuous variables,Dummy Variable for Nominal,22:29,16,3. Why cannot we use a Linear Regression for Categorical Response?,22:29,17,Example: Failing or Passing an Exam,Let us define a variable OutcomeOutcome = 0 if the individual fails the exam = 1 if the individual passes the examPredictor variable:the quantity of hours we use to studyLinear Probability Model (LPM) : Prob (Outcome=1) = + *Quantity of hours of study,22:29,18,Linear Probability Models (LPM),?,22:29,19,4. Logistic Regression Model,22:29,20,Logistic Regression Curve,0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,x,Probability,22:29,21,Logit Transformation,Logistic regression models transform probabilities called logits.whereiindexes all cases (observations).is the probability the event (a sale, for example) occurs in the ith case.lnis the natural log (to the base e).,22:29,22,Assumption,1,0,22:29,23,Logistic Regression Model,logit ( ) = b0 + b1X1wherelogit( )logit transformation of the probability of the eventb0intercept of the regression lineb1slope of the regression line.,线性关系,22:29,24,LOGISTIC Procedure,PROC LOGISTIC DATA=SAS-data-set ;CLASS variables ;MODEL response=predictors ;OUTPUT OUT=SAS-data-set keyword=name ;RUN;,Analyze Regression Binary Logistic Dependent: y Covariates: x Method: Forward WardSave Predicted Values Probabilities Group membershipOption CI for exp 95% Probability for Stepwise Entry: 0.1 Removal 0.15,SAS,SPSS,Maximum Likelihood Estimation is a statistical method for estimating the coefficients of a model.The likelihood functionL = Prob (p1* p2* * pn),22:29,25,SPSS Output result,Odds Ratio,22:29,26,LPM and Logistic Regression Models,22:29,27,Comparing LPM and the Logistic Curve,22:29,28,5. What Is an Odds Ratio?,An odds ratio indicates how much more likely, with respect to odds, a certain event occurs in one group relative to its occurrence in another group.,22:29,29,Probabilities from odds,The odds, calculated asCan be rearranged to express the probability of an event in terms of the odds:,22:29,30,Probabilities and Odds,22:29,31,Probability of Outcome,22:29,32,Odds,22:29,33,Odds Ratio,22:29,34,Properties of the Odds Ratio,Group B MoreLikely,Group A MoreLikely,0 1,No Association,- 0 ,Odds Ratio,Regression Coefficientb,22:29,35,Odds Ratio from a Logistic Regression Model,Estimated logistic regression model:Estimated odds ratio (each more 1 Study Hours):odds ratio = (e-8.469+.495(a+1)/(e-8.469+.495(a)odds ratio = eb=e.495 = 1.640,22:29,36,6. Multiple Logistic Regression,logit ( ) = b0 + b1X1 + b2X2 + b3X3,22:29,37,Backward Elimination Method,22:29,38,Adjusted Odds Ratio,22:29,39,Interaction in Multiple Logistic Regression,22:29,40,Interaction Plot,Income Level,Low,Medium,High,Predicted Logit,Males,Females,22:29,41,Backward Elimination Method,.,.,.,22:29,42,Multicollinearity in Multiple Logistic Regression,The presence of multicollinearity will not lead to biased coefficients. But the standard errors of the coefficients will be inflated. If a variable which you think should be statistically significant is not, consult the correlation coefficients. If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two.,22:29,43,Sample Sizes,=1520 times number of variables,22:29,44,Thanks for your attention,Thanks for your attention!,