环境数据分析课件 (26).pdf
Chapter 8 Analysis of CorrelationCorrelation depends only on participants as we assumed?Contents Analysis of correlation The methods for the analysis of correlation:Bi-variances Partial correlation Distant correlationThe Methods for the Analysis of CorrelationPearsons correlation coefficient-r value),.,(),.,(321321nnyyyyYxxxxX),(),(22xyxxNNPearsons correlation coefficient-r value niniiininiiinininiiiiiniiniiniiiYiniXiYXXYnyynxxnyxyxyyxxyyxxSyySxxnSSSSSSr11221122111121211/)(/)(11The Methods for the Analysis of CorrelationPearsons correlation coefficient-r valueniiiXYyyxxnSS1)(112121)(11)(11niiYniiXyynSSxxnSSThe Methods for the Analysis of CorrelationCovariance is a measure of the relationship between two random variables and to what extent,they change together.niiiXYyyxxnSS1)(11The Methods for the Analysis of CorrelationNo.PM2.5PM1010.110.1420.250.2530.230.2840.240.2550.260.2860.090.1070.250.2780.060.0990.230.24100.330.30110.150.16120.040.05130.200.20140.340.32150.220.24First step:Scatter plotContinue the correlation analysis.NoXYXY(X-X)2X2(Y-Y)2Y2(x-x)(y-y)10.110.140.01540.0081 0.01210.00509 0.01960.00642 20.250.250.06250.0025 0.06250.00150 0.06250.00193 30.230.280.06440.0009 0.05290.00472 0.07840.00206 40.240.250.060.0016 0.05760.00150 0.06250.00155 50.260.280.07280.0036 0.06760.00472 0.07840.00412 60.090.100.0090.0121 0.00810.01240 0.010.01225 70.250.270.06750.0025 0.06250.00344 0.07290.00293 80.060.090.00540.0196 0.00360.01472 0.00810.01699 90.230.270.06210.0009 0.05290.00344 0.07290.00176 100.330.300.0990.0169 0.10890.00786 0.090.01153 110.150.160.0240.0025 0.02250.00264 0.02560.00257 120.040.050.0020.0256 0.00160.02603 0.00250.02581 130.240.200.0480.0016 0.05760.00013 0.04-0.00045 140.340.320.10880.0196 0.11560.01181 0.10240.01521 150.220.240.05280.0004 0.04840.00082 0.05760.00057 Sum 3.043.200.75370.120.73440.100.78340.11The Methods for the Analysis of Correlation965.0114.011.010.012.011.011121211niiniiniiiYiniXiYXXYyyxxyyxxSyySxxnSSSSSSr965.015/20.37834.015/04.37344.015/2.304.37537.0/)(/)(2211221122111 niniiininiiinininiiiiiYXXYnyynxxnyxyxSSSSSSr2133.02027.0yxStatistical test!Guideline only!The Methods for the Analysis of CorrelationThere is no correlation between the two variablesHypothesis test for r-t testX,Y normal distribution,using t-test for determine the significance of r.0:0:10HHthe population correlationThere is a correlation between the two variablesThe Methods for the Analysis of Correlationthe correlation analysis has significant meaning.Hypothesis test for r-t test,212nrrtpttndfa,:2ptta,the correlation is weak.The Methods for the Analysis of CorrelationExample93.1813965.01965.02122nrrt01.0000.,01.3,13201.0ptttndfa The Methods for the Analysis of CorrelationCorrelation coefficients:Pearson correlation Spearman correlationKendall rank correlationVariances All Continuous Variables Small samples,normal distributionLarge samples or no normal distributionAll Ordinal categorical VariablesOne Ordinal categorical and one Continuous VariableNonparameterParameterThe Methods for the Analysis of Correlation1.Correlation coefficient The most widely used The linear related variablesThe Methods for the Analysis of Correlation2.Spearmans Rho/rank correlation efficient Monotonic relationships(whether linear or not).)1(6122nndi=Spearman rank correlation coefficientdi=the difference between the ranks of corresponding values Xi and Yin=number of value in each data set.-1+1)1(6122nndiYearThe numbers of habitatsThe endangered XRank(habitats)Rank(X)20122020472013232522201481110102015292413201614237420171216882018111299201921213620201722652021182651709.099104861)1(6122nndiThe accidentsThe exams rank(safety)rank(exam)di2202047923252208111010029241341423749121688011129902121369172265118265116A strong positive relationship between both coefficients that both techniques are approximately equal 709.05.825.825.58)()()(22yyxxyyxxriiiiThe Methods for the Analysis of Correlation3.Kendalls correlation Tau/rank co-efficient Kendalls Tau is a correlation suitable for quantitative and ordinal variables.It indicates how strongly 2 variables are monotonously related.Serves the exact same purpose as the Spearman rank correlation.Gross error sensitivity(GES)&Asymptotic variance(AV)The Methods for the Analysis of Correlation3.Kendalls correlation Tau/rank co-efficient Concordant(C)if(xi xj and yi yj)or(xi xj and yi xj and yi yj)or(xi yj)Neither if xi=xj or yi=yj(i.e.ties are not counted).)2,(,.,.,.,.,.,.,2121nCyyyyyxxxxxnjinji11The Methods for the Analysis of Correlation3.Kendalls correlation Tau/rank co-efficient No.78641593102X12345678910Y25136471089No.12345678910X(No.)11-517-1014-89-412-67-34-16-213-717-9Y(%)9.3-610.9-912.5-108.2-38.8-47.5-17.8-29.2-59.6-710.1-8The Methods for the Analysis of Correlation3.Kendalls correlation Tau/rank co-efficient No.XYCD7 1 2 818 2 5 536 3 1 704 4 3 601 5 6 415 6 4 409 7 7 303 8 10 0210 9 8 102 10 9 00The Methods for the Analysis of Correlation3.Kendalls correlation Tau/rank co-efficient )1(5.0nnnndcWhere:=Kendall rank correlation coefficientnc=number of concordant(Ordered in the same way).nd=Number of discordant(Ordered differently).69.09105.0738 The HabitatsThe Xrank(Habitats)rank(X)20204723252281110102924131423741216881112992121361722651826511-3 We calculate the example we had for Spearmans Example:rank(Habitats)Rank(X)CD137222713643473351506531743088209910101000Arranged RankSet rank to the data Example:Calculating the number of Concordant C and Discorant D556.09105.01035)1(5.0nnnndcModerate positive relationship