版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
【原創(chuàng)】【原創(chuàng)】R語言案例數(shù)據(jù)分析報告論文〔附代碼數(shù)據(jù)〕有問題到淘寶找“大數(shù)據(jù)部落”就可以了RR語言線性回歸案例數(shù)據(jù)分析可視化報告30個職業(yè)棒球大聯(lián)盟球隊的數(shù)據(jù),并檢查一個賽季的得分哪個變量〔假設有的話〕可以幫助我們最好地推想一個賽季中球隊的得分狀況。數(shù)據(jù)load(load(“more/mlb11.RData“)at_bats繪制這種關系作為推想。關系看起來是線性的嗎?假設你知道一個團隊的at_batsat_bats,你會習慣使用線性模型來推想運行次數(shù)嗎?散點圖plotplot(mlb11$at_bat,mlb11$runs,xlab=“at_bat“,ylab=“runs“,main=“at_bat“,frame.plot=TRUE,col=“red“)abline(lm(mlb11$runs~mlb11$at_bat))..假設關系看起來是線性的,我們可以用相關系數(shù)來量化關系的強度。corcor(mlb11$runs,mlb11$at_bats)##[1]0.610627.殘差平方和.殘差平方和回想一下我們描述單個變量分布的方式。能夠描述兩個數(shù)值變量〔例如上面的runandat_bats〕的關系也是有用的。度以及任何不尋常的觀看。plot_ss(x=mlb11$at_bats,y=mlb11$runs)#Clicktwopointstomakealine.##Call:##lm(formula=y~x,data=pts)######Coefficients:(Intercept)x##-2789.24290.6305####SumofSquares:123721.9AfterrunningAfterrunningthiscommand,you’llbepromptedtoclicktwopointsontheplottodefinealine.Onceyou’vedonethat,thelineyouspecifiedwillbeshowninblackandtheresidualsinblue.Notethatthereare30residuals,oneforeachofthe30observations.Recallthattheresidualsarethedifferencebetweentheobservedvaluesandthevaluespredictedbytheline:ei=yiy^iei=yiy^iTheThemostcommonwaytodolinearregressionistoselectthelinethatminimizesthesumofsquaredresiduals.Tovisualizethesquaredresiduals,youcanreruntheplotcommandandaddaddtheargumentshowSquares=TRUE.plot_ss(x=mlb11$at_bats,y=mlb11$runs,showSquares=TRUE)##Clicktwopointstomakealine.##Call:####lm(formula=y~x,data=pts)######Coefficients:(Intercept)x##-2789.24290.6305####SumofSquares:123721.9NotethattheoutputNotethattheoutputfromtheplot_ssfunctionprovidesyouwiththeslopeandinterceptofyourlineaswellasthesumofsquares.3. Usingplot_ss,choosealinethatdoesagoodjobofminimizingthesumofsquares.Runthefunctionseveraltimes.Whatwasthesmallestsumofsquaresthatyougot?Howdoesitcomparetoyourneighbors?Answer:Thesmallestsumofsquaresis123721.9.Itexplainsthedispersionfrommean.ThelinearmodelItisrathercumbersometotrytogetthecorrectleastsquaresline,i.e.thelinethatminimizesthesumofsquaredresiduals,throughtrialanderror.InsteadwecanusethelmfunctioninRtofitthelinearmodel(a.k.a.regressionline).m1<-lm(runs~at_bats,data=mlb11)Thefirstargumentinthefunctionlmisaformulathattakestheformy~x.Hereitcanbereadthatthatwewanttomakealinearmodelofrunsasafunctionofat_bats.ThesecondargumentspecifiesspecifiesthatRshouldlookinthemlb11dataframetofindtherunsandat_batsvariables.TheTheoutputoflmisanobjectthatcontainsalloftheinformationweneedaboutthelinearmodelthatwasjustfit.Wecanaccessthisinformationusingthesummaryfunction.summary(m1)####Call:##lm(formula=runs~at_bats,data=mlb11)####Residuals:##Min1QMedian3QMax####-125.58-47.05-16.5954.40176.87##Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-2789.2429 853.6957-3.2670.002871**##at_bats 0.6305 0.1545 4.0800.000339***##---####Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:66.47on28degreesoffreedom##MultipleR-squared:0.3729,AdjustedR-squared:0.3505##F-statistic:16.65on1and28DF,ue:0.0003388p-valLet’sconsiderLet’sconsiderthisoutputpiecebypiece.First,theformulausedtodescribethemodelisshownatthetop.Aftertheformulayoufindthefive-numbersummaryoftheresiduals.Thekey;itsfirstcolumndisplaysthelinearmodel’sy-interceptandandthecoefficientofat_bats.Withthistable,wecanwritedowntheleastsquaresregressionregressionlineforthelinearmodel:y^=2789.2429+0.6305?atbatsy^=?2789.2429+0.6305?atbatsOnelastpieceofinformationwewilldiscussfromthesummaryoutputistheMultipleR-squared,ormoresimply,R2R2.TheR2R2valuerepresentstheproportionofvariabilityintheresponsevariablethatisexplainedbytheexplanatoryvariable.Forthismodel,37.3%ofthevariabilityinrunsisexplainedbyat-bats.4. Fit4. Fitanewmodelthatuseshomerunstopredictruns.UsingtheestimatesfromtheRoutput,writeoutput,writetheequationoftheregressionline.Whatdoestheslopetellusinthecontextoftherelationshipbetweensuccessofateamanditshomeruns?Answer:homerunshaspositiverelationshipwithruns,whichmeans1homerunsincrease1.835timesruns.home.runs<-home.runs<-lm(runs~homeruns,data=mlb11)home.runs####Call:##lm(formula=runs~homeruns,data=mlb11)######Coefficients:(Intercept)homeruns##415.2391.835PredictionPredictionandpredictionerrorsLet’screateascatterplotwiththeleastsquareslinelaidontop.plot(mlb11$runs~mlb11$at_bats)abline(m1)Thefunctionablineplotsalinebasedonitsslopeandintercept.Here,weusedashortcutbyprovidingthemodelm1,whichcontainsbothparameterestimates.Thislinecanbeusedtopredictyyatanyvalueofxx.Whenpredictionsaremadeforvaluesofxxthatarebeyondtherangeoftheobserveddata,itisreferredtoasextrapolatioandisnotusuallyrecommended.However,predictionsmadewithintherangeofthedataaremorereliable.They’realsousedtocomputetheresiduals.5. Ifateammanagersawtheleastsquaresregressionlineandnottheactualdata,howmanyrunsmanyrunswouldheorshepredictforateamwith5,578at-bats?Isthisanoverestimateoranunderestimate,andbyhowmuch?Inotherwords,whatistheresidualforthisprediction?pred##[1]728.1323residual=residual=0.63058*(5578)residual##[1]3517.375ModelModeldiagnosticsToassesswhetherthelinearmodelisreliable,weneedtocheckfor(1)linearity,(2)nearlynormalresiduals,and(3)constantvariability.Linearity:Youalreadycheckediftherelationshipbetweenrunsandat-batsislinearusingascatterplot.Weshouldalsoverifythisconditionwithaplotoftheresidualsvs.at-bats.Recallthatanycodefollowinga#isintendedtobeacommentthathelpsunderstandthecodebutisignoredbyR.plot(m1$residuals~mlb11$at_bats)abline(h=0,lty=3)ashedlineaty=0#addsahorizontald6.6.Isthereanyapparentpatternintheresidualsplot?Whatdoesthisindicateaboutthelinearityoftherelationshipbetweenrunsandat-bats?Answer:Answer:theresidualshasnormallinearityoftherelationshipbetweenrunsansat-bats,whichmeanmeanis0.Nearlynormalresidua:Tocheckthiscondition,wecanlookatahistogramhist(m1$residuals)qqnormqqnorm(m1$residuals)qqline(m1$residuals)#addsdiagonallinetothenormalprobplot7.7.Basedonthehistogramandthenormalprobabilityplot,doesthenearlynormalresidualsconditionappeartobemet?Answer:Answer:Yes.It’snearlynormal.8. Based8. Basedontheplotin(1),doestheconstantvariabilityconditionappeartobemet?Answer:Yes,thepointsconstantlyaroundtheleastsquaresline.1. 1. Chooseanothertraditionalvariablefrommlb11thatyouthinkmightbeagoodpredictorpredictorofruns.Produceascatterplotofthetwovariablesandfitalinearmodel.Ataaglance,doesthereseemtobealinearrelationship?Answer:Answer:Yes,thescatterplotshowstheyhavealinearrelationship..1.1.Howdoesthisrelationshipcomparetotherelationshipbetweenrunsandat_bats?UsetheR22valuesfromthetwomodelsummariestocompare.Doesyourvariableseemtopredictrunsbetterthanat_bats?Howcanyoutell?plotplot(mlb11$hits,mlb11$runs,xlab=“hits“,ylab=“runs“,main=“hitsvsruns“,frame.plot=TRUE,col=“red“)abline(lm(mlb11$runs~mlb11$hits))m2m2<-lm(runs~hits,data=mlb11)summary(m2)####Call:##lm(formula=runs~hits,data=mlb11)####Residuals:##Min1QMedian3QMax##93-103.718-27.179-5.23319.322140.6####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-375.5600 151.1806-2.4840.0192*##hits 0.7589 0.1071 7.0851.04e-07***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:50.23on28degreesoffreedom####MultipleR-squared:0.6419,AdjustedR-squared:0.6292##F-statistic:ue:1.043e-0750.2on1and28DF,p-val1. Now1. Nowthatyoucansummarizethelinearrelationshipbetweentwovariables,investigatetherelationshipsbetweenrunsandeachoftheotherfivetraditionalvariables.Whichvariablevariablebestpredictsruns?Supportyourconclusionusingthegraphicalandnumericalmethodswe’vediscussed(forthesakeofconciseness,onlyincludeoutputforforthebestvariable,notallfive).Answer:TheAnswer:Thenew_obsisthebestpredictsrunssinceithassmallestStd.Error,whichthepointsareonorveryclosetotheline.par(mfrow=c(2,3))plot(mlb11$hits,mlb11$runs,xlab=“hits“,ylab=“runs“,frame.plot=TRUE,col=“blue“)plot(mlb11$bat_avg,mlb11$runs,xlab=“batblue“)plot(mlb11$new_slug,mlb11$runs,xlab=“new_slug“,ylab=“runs“,frame.plot=TRUE,col=“blue“)plot(mlb11$new_onbase,mlb11$runs,xlab=“new_onbase“,ylab=“runs“,frame.plot=TRUE,col=“blue“)plot(mlb11$new_obs,mlb11$runs,xlab=“newblue“)1$new_obs)for(iin1:5){print(summary(lms))}####Call:##l)=l[,i]~l[,i+1],data=######Min1QMedian3QMax##93-103.718-27.179-5.23319.322140.6####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-375.5600 151.1806-2.4840.0192*##l[,i+1] 0.7589 0.1071 7.0851.04e-07***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:50.23on28degreesoffreedom##MultipleR-squared:0.6419,AdjustedR-squared:0.6292##F-statistic:50.2on1and28DF,p-value:1.043e-07######Call:##lm(formula=l[,i]~l[,i+1],data=l)####Residuals:##Min1QMedian3QMax##-27.855-8.8401.14110.08621.899####Coefficients:##(>|t|)EstimateStd.ErrortvaluePr##(Intercept)32e-06***-312.151.0-6.121.##l[,i+1]6750.9199.833.79<2e-16***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:13.71on28degreesoffreedom##MultipleR-squared:0.9761,AdjustedR-squared:0.9752##F-statistic:1142on1and28DF,p-value:<2.2e-16######Call:##l)##lm(formula=l[,i]~l[,i+1],data=##Residuals:##Min1QMedian3QMax##-0.0120811-0.0038072-0.00076230.00505690.0142072####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)0.11038 0.01851 5.9622.02e-06***##l[,i+1] 0.36244 0.04630 7.8281.58e-08***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:0.007263on28degreesoffreedom##MultipleR-squared:0.6864,AdjustedR-squared:0.6752##F-statistic:61.28on1and28DF,p-value:1.582e-08######Call:##lm(formula=l[,i]~l[,i+1],data=l)####Residuals:## Min 1Q Median Max##-0.035295-0.0084810.0001560.010515####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-0.20671 0.06434-3.2130.0033**##l[,i+1] 1.88957 0.20239 9.4203.54e-10***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:0.01452on28degreesoffreedom##MultipleR-squared:0.7601,AdjustedR-squared:0.7516##F-statistic:88.74on1and28DF,p-value:3.538e-10######Call:##l)lm(formula=l[,i]~l[,i+1],data=####Residuals:1QMedian1QMedian3Q-0.00425990.0009995Max##-0.0074684180.0127444####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)0.10243 0.01535 6.6743.05e-07***##l[,i+1] 0.30321 0.0213114.2292.42e-14***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1######Residualstandarderror:0.004768on28de
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 口腔解剖生理學-第十一章(面頸顱部局部解剖)
- 食品安全案例-課件案例十六-豆?jié){煮制不充分引起的食物中毒
- 小額個人貸款協(xié)議書范本
- 技術合同寫作指南:技術開發(fā)合同的主要條款撰寫
- 家庭聚會花卉布置協(xié)議
- 土地租賃期滿拆除協(xié)議
- 材料采購合同寫作技巧
- 裝修合同的主要內(nèi)容有哪些
- 標準住宅出租合同樣本
- 倉庫租賃合同書范本
- 中小學教師信息技術培訓
- 國家基本公共衛(wèi)生服務項目培訓課件
- 工程圖學習題集答案合工大課件
- 2023年江蘇省沿海開發(fā)集團有限公司校園招聘筆試模擬試題及答案解析
- 惠普的管理流程-詳細解讀
- T-CIATCM 008-2019 中醫(yī)藥衛(wèi)生經(jīng)濟信息標準體系表
- 地下車庫給排水及管線綜合設計要求
- 2022年安全員上崗證繼續(xù)教育考試答案
- 通信工程擬投入的主要施工設備表
- 快遞公司與菜鳥驛站合作協(xié)議【六篇】
- 度日如年(關于清末廣州十三行歷史的小品劇本)
評論
0/150
提交評論