回歸分析大作業(yè)_第1頁
回歸分析大作業(yè)_第2頁
回歸分析大作業(yè)_第3頁
回歸分析大作業(yè)_第4頁
回歸分析大作業(yè)_第5頁
已閱讀5頁,還剩19頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、精選優(yōu)質文檔-傾情為你奉上回歸大作業(yè)國內旅游消費影響的回歸分析一、問題引入我國第三產業(yè)發(fā)展迅速,在2010年其已占國內生產總值的43.14%,而旅游業(yè)在第三產業(yè)中占有重要地位,且與餐飲、住宿、休閑、運輸?shù)犬a業(yè)聯(lián)系密切,所以此次分析以探究國內旅游消費的影響為目的,并建立回歸模型。二、模型設計運用多元線性模型擬合,若擬合效果不顯著,則進行l(wèi)og或平方根變換或使用多項式擬合等其他模型。1、相關性分析,首先確定與因變量有相關性的變量。2、建立全模型多元線性回歸,若回歸方程F檢驗未通過,則查找原因、更換模型;若有部分回歸系數(shù)檢驗未通過,則進行選元(步驟2),剔除部分變量再繼續(xù);若所有檢驗都良好,則模型初

2、步確立,跳過步驟2。3、運用逐步回歸方法篩選變量,并進行t檢驗,若效果顯著,則可初步確立多元線性回歸模型;若仍有部分變量未通過檢驗,則再單獨進行變量篩選,綜合運用AIC準則等確定剔除變量,直至所有變量都通過t檢驗。4、回歸診斷。進行殘差分析,檢驗殘差是否滿足正態(tài)分布,是否有相關性,也即自變量間是否有自相關性,檢驗是否存在異常值和強影響值,是否存在異方差性,是否存在多重共線性。若以上問題存在,則需修改模型,或重新篩選變量,或增減樣本。5、模型最終確立。三、數(shù)據(jù)yearincomenumberexpenselevelroadrail199448108.5524195.3320.0111.785.9

3、0199559810.5629218.7345.1115.706.24199670142.5640256.2377.6118.586.49199778060.9644328.1394.6122.646.60199883024.3695345.0417.8127.856.64199988479.2719394.0452.3135.176.74200098000.5744426.6491.0140.276.872001.2784449.5521.2169.807.012002.7878441.8557.6176.527.192003.0870395.7596.9180.987.302004.811

4、02427.5645.3187.077.442005.51212436.1695.2334.527.542006.91394446.9761.9345.707.712007.01610482.6843.4358.377.802008.71712511.0916.8373.027.972009.51902535.41001.6386.088.552010.02103598.21062.6400.829.12yearairrailtranroadtranshiptranairtrantravel1994104.562616540391023.51995112.902392451171375.719

5、96116.65947972289555551638.41997142.50933082257356302112.71998150.58950852054557552391.21999152.221915160942831.92000150.291938667223175.52001155.361864575243522.42002163.771869385943878.42003174.95972601714287593442.32004204.9419040121234710.72005199.8520227138275285.92006211.3522047159686229.72007

6、234.3022835185767770.62008246.1820334192518749.32009234.51223142305210183.72010276.51223922676912579.8數(shù)據(jù)來源:中國統(tǒng)計年鑒2011數(shù)據(jù)說明:Year:年份。Income:國民總收入,單位億元。Number:旅游人數(shù)。Expense:人均旅游花費,單位元。Level:居民消費水平指數(shù),以1978年為基年。Road:公路里程,單位萬公里。Rail:鐵路里程,單位萬公里。Air:民航里程,單位萬公里。Roadtran:公路客運量,單位萬人。Railtran:鐵路客運量,單位萬人。Shiptran:

7、水路客運量,單位萬人。Airtran:民航客運量,單位萬人。Travel:國內旅游消費總額,單位億元。四、回歸分析1、相關性首先分析相關性,畫出散布陣。 可較為直觀地看出,travel與各變量間有較強的相關性,除了road,和shiptran兩項,做相關性檢驗,可見,travel與road是線性相關的,相關系數(shù)為0.93,p-value = 4.563e-08,而travel與shiptran不相關,p-value = 0.9983,所以可先排除shiptran,再做回歸。2、全回歸模型直接建立多元回歸模型,得結果:Coefficients: Estimate Std. Error t val

8、ue Pr(>|t|) (Intercept) -5.972e+03 3.193e+03 -1.870 0. income 2.151e-02 4.779e-03 4.501 0. * number 1.039e+00 1.446e+00 0.719 0. expense 6.805e+00 1.124e+00 6.052 0. *level -5.815e+00 1.261e+00 -4.610 0. * road -1.468e+00 1.019e+00 -1.441 0. rail 6.274e+02 4.462e+02 1.406 0. air -4.155e+00 2.790e

9、+00 -1.490 0. railtran 2.524e-02 8.492e-03 2.972 0. * roadtran -4.093e-04 4.554e-04 -0.899 0. airtran 1.058e-01 1.272e-01 0.832 0. -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 84.55 on 6 degrees of freedomMultiple R-squared: 0.9998, Adjusted R-squared: 0.9994 F-statistic:

10、 2462 on 10 and 6 DF, p-value: 5.061e-10其中,R2=0.9998, F檢驗的p-value: 2.632e-08,可見回歸模型的檢驗是成立的,但回歸系數(shù)并不是全能通過檢驗,所以應該進行選元。3、選元先進行逐步回歸,逐步回歸排除了roadtran,number兩個變量,以AIC準則為主要判斷依據(jù),調整后的AIC值為153.73,達到最小值。再檢驗一下回歸模型:Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.393e+03 2.102e+03 -2.090 0. . i

11、ncome 1.898e-02 2.320e-03 8.179 3.72e-05 *expense 7.038e+00 9.369e-01 7.512 6.85e-05 *level -5.427e+00 1.057e+00 -5.133 0. *road -1.460e+00 9.339e-01 -1.564 0. rail 3.697e+02 2.865e+02 1.290 0. air -3.589e+00 2.496e+00 -1.438 0. railtran 2.166e-02 6.843e-03 3.165 0. * airtran 2.032e-01 5.464e-02 3.7

12、19 0. * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 78.95 on 8 degrees of freedomMultiple R-squared: 0.9997, Adjusted R-squared: 0.9994 F-statistic: 3529 on 8 and 8 DF, p-value: 2.252e-13 可見回歸模型改善,自由度調整負相關系數(shù)達到了0.9994,有所提高,這與AIC準則的判斷相符,而回歸系數(shù)的檢驗也有所好轉,但仍然有road,rail,air通不過檢驗

13、。若去掉一個變量回歸,可見: Df Sum of Sq RSS AIC<none> 49866 153.73income 1 189.75expense 1 187.19level 1 176.50road 1 15241 65107 156.26rail 1 10380 60246 154.94air 1 12886 62752 155.63railtran 1 62438 165.53airtran 1 86215 168.79去掉rail,AIC增加最小,同時RSS增加最小,而回歸方程系數(shù)檢驗:Coefficients: Estimate Std. Error t value

14、 Pr(>|t|) (Intercept) -1.773e+03 5.648e+02 -3.140 0. * income 1.935e-02 2.386e-03 8.112 1.98e-05 *expense 7.977e+00 6.116e-01 13.043 3.77e-07 *level -5.126e+00 1.069e+00 -4.797 0. *road -2.214e+00 7.550e-01 -2.933 0. * air -5.129e+00 2.272e+00 -2.257 0. . railtran 1.495e-02 4.613e-03 3.241 0. * a

15、irtran 2.603e-01 3.323e-02 7.832 2.62e-05 * 只有air一項在a=0.05的情況下是不能通過檢驗的,若排除air,則:Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.450e+03 5.683e+02 -4.310 0.00154 * income 1.834e-02 2.782e-03 6.593 6.13e-05 *expense 7.465e+00 6.742e-01 11.072 6.21e-07 *level -5.389e+00 1.261e+00 -

16、4.273 0.00163 * road -2.381e+00 8.921e-01 -2.669 0.02355 * railtran 1.933e-02 4.970e-03 3.889 0.00301 * airtran 2.451e-01 3.864e-02 6.343 8.42e-05 *所有回歸系數(shù)通過檢驗,回歸模型初步確立。4、回歸診斷計算得出殘差,進行W正態(tài)性檢驗,得到p-value = 0.9066,不能拒絕正態(tài)性假設。而回歸值與標準化殘差的殘差圖為:從圖中也可看出,殘差分布均勻且無規(guī)律,所以線性回歸的基本假設滿足,且沒有自相關性。而再看:綜合看上面四幅圖,11和15號觀測值可能

17、為強影響值,但產生原因還需要探究,可能是統(tǒng)計過程上的,亦可能是分析方法上的,去掉后回歸效果減弱,所以暫不剔除。再檢驗多重共線性,kappa=1346.411>1000,所以存在多重共線性,接近零的特征值及其相應特征向量為:0.,,61, 0.2, 0.3, -0.4, 0.5, -0.6, -0. 0.,51, -0.2, 0.3, -0.4, 0.5, -0.6, 0.可見,1,3,6之間即income與level,airtran之間可能存在嚴重的多重共線性關系,更可能的是在income與level之間,這在經濟意義上也可以理解,國民收入越高,消費水平越高,而坐飛機的人才越多,前兩者關

18、系更直接。所以引起原因可能是有多余的自變量,分別去掉income,level,airtran做回歸,并計算kappa值。從結果知,不管去掉哪一個,kappa值均減少一半左右,而只有去掉level時,回歸方程幾乎無影響,Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.824e+03 7.511e+02 -5.091 0. *income 1.217e-02 3.811e-03 3.194 0. * expense 5.483e+00 7.843e-01 6.991 2.3e-05 *road -4.247e

19、+00 1.247e+00 -3.407 0. * railtran 2.708e-02 7.416e-03 3.651 0. * airtran 1.929e-01 5.876e-02 3.284 0. * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 155.7 on 11 degrees of freedomMultiple R-squared: 0.9985, Adjusted R-squared: 0.9978 F-statistic: 1450 on 5 and 11 DF, p-v

20、alue: 4.078e-15 所以可以剔除level。再做一下異方差性的檢驗,用等級相關系數(shù)法,計算殘差的絕對值與自變量間的等級相關系數(shù),分別為0.,0.,0.,0,0.發(fā)現(xiàn)并無相關的,所以模型擬合良好。5、模型確立Travel=-3.824e+03+1.217e-02*income+5.483*expense-4.247*road+2.708e-02*railtran+1.929e-01*airtran五、模型評注從模型來看,國內旅游消費量可由國民收入、人均旅游花費、鐵路客運量、民航客運量、公路里程來建模模擬預測,這與實際意義相符。前兩者可歸納為人民生活水平,后三者是國家交通建設方面,而恰

21、恰包括了公路、鐵路、航空三個方面。所以回歸方程的建立與其實際意義大致相符,影響因素也基本確定。但是受開始自變量選擇的影響,有可能存在重要變量為選入。六、程序代碼及輸出(編程語言:R)> x=read.csv("數(shù)據(jù).csv",head=T)> a=x,2:13> plot(a) > cor.test(road,travel) /*相關性檢驗*/ Pearson's product-moment correlationdata: road and travel t = 10.0692, df = 15, p-value = 4.563e-08a

22、lternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0. 0. sample estimates: cor 0.> cor.test(shiptran,travel) Pearson's product-moment correlationdata: shiptran and travel t = 0.0021, df = 15, p-value = 0.9983alternative hypothesis: true correlation is no

23、t equal to 0 95 percent confidence interval: -0. 0. sample estimates: cor 0.>model=lm(travelincome+number+expense+level+road+rail+air+railtran+roadtran+airtran)> summary(model) /*建立回歸模型*/Call:lm(formula = travel income + number + expense + level + road + rail + air + railtran + roadtran + airt

24、ran)Residuals: Min 1Q Median 3Q Max -72.549 -44.860 3.562 44.806 90.603 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.972e+03 3.193e+03 -1.870 0. income 2.151e-02 4.779e-03 4.501 0. * number 1.039e+00 1.446e+00 0.719 0. expense 6.805e+00 1.124e+00 6.052 0. *level -5.815e+00 1.

25、261e+00 -4.610 0. * road -1.468e+00 1.019e+00 -1.441 0. rail 6.274e+02 4.462e+02 1.406 0. air -4.155e+00 2.790e+00 -1.490 0. railtran 2.524e-02 8.492e-03 2.972 0. * roadtran -4.093e-04 4.554e-04 -0.899 0. airtran 1.058e-01 1.272e-01 0.832 0. -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual s

26、tandard error: 84.55 on 6 degrees of freedomMultiple R-squared: 0.9998, Adjusted R-squared: 0.9994 F-statistic: 2462 on 10 and 6 DF, p-value: 5.061e-10> model1=step(model) /*逐步回歸*/Start: AIC=155.17travel income + number + expense + level + road + rail + air + railtran + roadtran + airtran Df Sum

27、of Sq RSS AIC- number 1 3693 46589 154.57- airtran 1 4948 47844 155.02<none> 42897 155.17- roadtran 1 5775 48671 155.31- rail 1 14137 57033 158.01- road 1 14850 57746 158.22- air 1 15862 58758 158.52- railtran 1 63136 168.55- income 1 178.26- level 1 178.90- expense 1 186.50Step: AIC=154.57tra

28、vel income + expense + level + road + rail + air + railtran + roadtran + airtran Df Sum of Sq RSS AIC- roadtran 1 3276 49866 153.73<none> 46589 154.57- rail 1 11735 58325 156.39- air 1 15657 62246 157.50- road 1 17009 63598 157.86- airtran 1 58169 166.34- railtran 1 64855 167.40- income 1 176.

29、91- level 1 178.18- expense 1 189.12Step: AIC=153.73travel income + expense + level + road + rail + air + railtran + airtran Df Sum of Sq RSS AIC<none> 49866 153.73- rail 1 10380 60246 154.94- air 1 12886 62752 155.63- road 1 15241 65107 156.26- railtran 1 62438 165.53- airtran 1 86215 168.79-

30、 level 1 176.50- expense 1 187.19- income 1 189.75> summary(model1)Call:lm(formula = travel income + expense + level + road + rail + air + railtran + airtran)Residuals: Min 1Q Median 3Q Max -66.673 -57.766 2.796 46.749 91.039 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.39

31、3e+03 2.102e+03 -2.090 0. . income 1.898e-02 2.320e-03 8.179 3.72e-05 *expense 7.038e+00 9.369e-01 7.512 6.85e-05 *level -5.427e+00 1.057e+00 -5.133 0. *road -1.460e+00 9.339e-01 -1.564 0. rail 3.697e+02 2.865e+02 1.290 0. air -3.589e+00 2.496e+00 -1.438 0. railtran 2.166e-02 6.843e-03 3.165 0. * ai

32、rtran 2.032e-01 5.464e-02 3.719 0. * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 78.95 on 8 degrees of freedomMultiple R-squared: 0.9997, Adjusted R-squared: 0.9994 F-statistic: 3529 on 8 and 8 DF, p-value: 2.252e-13 > model2=drop1(model1) /*減少一個變量做回歸*/> model2Sing

33、le term deletionsModel:travel income + expense + level + road + rail + air + railtran + airtran Df Sum of Sq RSS AIC<none> 49866 153.73income 1 189.75expense 1 187.19level 1 176.50road 1 15241 65107 156.26rail 1 10380 60246 154.94air 1 12886 62752 155.63railtran 1 62438 165.53airtran 1 86215 1

34、68.79> model3=update(model1,.-rail) /*剔除rail*/> summary(model3)Call:lm(formula = travel income + expense + level + road + air + railtran + airtran)Residuals: Min 1Q Median 3Q Max -77.120 -62.739 -7.682 57.073 96.157 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.773e+03 5

35、.648e+02 -3.140 0. * income 1.935e-02 2.386e-03 8.112 1.98e-05 *expense 7.977e+00 6.116e-01 13.043 3.77e-07 *level -5.126e+00 1.069e+00 -4.797 0. *road -2.214e+00 7.550e-01 -2.933 0. * air -5.129e+00 2.272e+00 -2.257 0. . railtran 1.495e-02 4.613e-03 3.241 0. * airtran 2.603e-01 3.323e-02 7.832 2.62

36、e-05 *-Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 81.82 on 9 degrees of freedomMultiple R-squared: 0.9997, Adjusted R-squared: 0.9994 F-statistic: 3756 on 7 and 9 DF, p-value: 7.348e-15 > model4=update(model3,.-air)> summary(model4)Call:lm(formula = travel income +

37、 expense + level + road + railtran + airtran)Residuals: Min 1Q Median 3Q Max -165.78 -44.43 12.86 49.24 123.92 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.450e+03 5.683e+02 -4.310 0.00154 * income 1.834e-02 2.782e-03 6.593 6.13e-05 *expense 7.465e+00 6.742e-01 11.072 6.21e-0

38、7 *level -5.389e+00 1.261e+00 -4.273 0.00163 * road -2.381e+00 8.921e-01 -2.669 0.02355 * railtran 1.933e-02 4.970e-03 3.889 0.00301 * airtran 2.451e-01 3.864e-02 6.343 8.42e-05 *-Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 97.14 on 10 degrees of freedomMultiple R-squared

39、: 0.9995, Adjusted R-squared: 0.9991 F-statistic: 3108 on 6 and 10 DF, p-value: 9.282e-16 > resid=resid(model4)> resid 1 2 3 4 5 6 32. -8. 12. -83. 50. 47. 7 8 9 10 11 12 -54. -28. 123. 80. -165. 33. 13 14 15 16 17 -28. -44. -112. 96. 49. > shapiro.test(resid) /*W正態(tài)性檢驗*/ Shapiro-Wilk normal

40、ity testdata: resid W = 0.9756, p-value = 0.9066> y=predict(model4)> rstandard=rstandard(model4)> plot(y,rstandard)> plot(model4,1)> plot(model4,2)> plot(model4,3)> plot(model4,4)> attach(x)> aa=data.frame(travel,income,expense,level,road,railtran,airtran)> b=aa,2:7>

41、 bb=cor(b)> kappa(bb,exact=T) /*計算kappa值*/1 1366.411> eigen(bb) /*求解矩陣特征值及特征向量*/$values1 5. 0. 0. 0. 0.6 0.$vectors ,1 ,2 ,3 ,4 ,51, -0. -0. 0. 0. -0.2, -0. 0. 0. -0. 0.3, -0. 0. -0. 0. -0.4, -0. -0. -0. -0. 0.5, -0. -0. 0. -0. -0.6, -0. -0. 0. 0. 0. ,61, 0.2, 0.3, -0.4, 0.5, -0.6, -0.> bbb

42、=kappa(cor(b,colnames(b)!="level") /*去掉變量level后求kappa值*/> bbb1 529.9542> kappa(cor(b,colnames(b)!="income")1 537.9962> kappa(cor(b,colnames(b)!="airtran")1 624.6458> summary(update(model4,.-level)Call:lm(formula = travel income + expense + road + railtran +

43、airtran)Residuals: Min 1Q Median 3Q Max -322.63 -58.04 -11.62 95.45 214.69 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.824e+03 7.511e+02 -5.091 0. *income 1.217e-02 3.811e-03 3.194 0. * expense 5.483e+00 7.843e-01 6.991 2.3e-05 *road -4.247e+00 1.247e+00 -3.407 0. * railtran

44、 2.708e-02 7.416e-03 3.651 0. * airtran 1.929e-01 5.876e-02 3.284 0. * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 155.7 on 11 degrees of freedomMultiple R-squared: 0.9985, Adjusted R-squared: 0.9978 F-statistic: 1450 on 5 and 11 DF, p-value: 4.078e-15 > summary(updat

45、e(model4,.-income)Call:lm(formula = travel expense + level + road + railtran + airtran)Residuals: Min 1Q Median 3Q Max -466.59 -84.23 -15.25 150.26 246.75 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.844e+03 9.639e+02 -5.025 0. *expense 7.056e+00 1.480e+00 4.767 0. *level -1.

46、075e+00 2.377e+00 -0.452 0. road -4.336e+00 1.855e+00 -2.337 0. * railtran 3.611e-02 9.409e-03 3.838 0. * airtran 3.690e-01 7.443e-02 4.958 0. *-Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 214.1 on 11 degrees of freedomMultiple R-squared: 0.9971, Adjusted R-squared: 0.995

47、8 F-statistic: 765.5 on 5 and 11 DF, p-value: 1.357e-13 > summary(update(model4,.-airtran)Call:lm(formula = travel income + expense + level + road + railtran)Residuals: Min 1Q Median 3Q Max -556.44 -72.17 47.38 95.54 239.22 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.239e+03 1.185e+03 -2.733 0. * income 2.692e-02 5.194e-03 5.183 0. *expense 6.359e+00 1.392e+00 4.569 0. *level -2.861e+00 2.557e+00 -1.11

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論