北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第1頁(yè)
北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第2頁(yè)
北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第3頁(yè)
北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第4頁(yè)
北大暑期課程《回歸分析》(Linear-Regression-Analysis)講義PKU5_第5頁(yè)
已閱讀5頁(yè),還剩5頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Class5:ANOVA(AnalysisofVariance)andF-tests

I.WhatisANOVA

WhatisANOVA?ANOVAistheshortnamefortheAnalysisofVariance.TheessenceofANOVAistodecomposethetotalvarianceofthedependentvariableintotwoadditivecomponents,oneforthestructuralpart,andtheotherforthestochasticpart,ofaregression.Todaywearegoingtoexaminetheeasiestcase.

II.ANOVA:AnIntroduction

Letthemodelbe

.

Assumingxisacolumnvector(oflengthp)ofindependentvariablevaluesfortheith'observation,

.

Thenisthepredictedvalue.

sumofsquarestotal:

because.

ThisisalwaystruebyOLS.

=SSE+SSR

Important:thetotalvarianceofthedependentvariableisdecomposedintotwoadditiveparts:SSE,whichisduetoerrors,andSSR,whichisduetoregression.

Geometricinterpretation:[blackboard]

DecompositionofVariance

Ifwetreat(yī)Xasarandomvariable,wecandecomposetotalvariancetothebetween-groupportionandthewithin-groupportioninanypopulation:

?

Prove:

?

?

(bytheassumptionthat,forallpossiblek.)

TheANOVAtableistoestimat(yī)ethethree(cuò)quantitiesofequation(1)fromthesample.Asthesamplesizegetslargerandlarger,theANOVAtablewillapproachthee(cuò)quationcloserandcloser.

Inasample,decompositionofestimatedvarianceisnotstrictlytrue.Wethusneedtoseparatelydecomposesumsofsquaresanddegreesoffreedom.IsANOVAamisnomer?

III.ANOVAinMatrix

Iwilltrytogiveasimpliedrepresentat(yī)ionofANOVAasfollows:

(because)

(inyourtextbook,monsterlook)

SSE=e'e

(because,asalways)

(inyourtextbook,monsterlook)

IV.ANOVATable

SOURCE

SS

DF

MS

F

with

Regression

SSR

DF(R)

MSR

MSR/MSE

DF(R)

Error

SSE

DF(E)

MSE

DF(E)

Total

SST

DF(T)

Letususearealexample.Assumethatwehavearegressionestimatedtobe

y=-1.70+0.840x

ANOVATable

SOURCE

SS

DF

MS

F

with

Regression

6.44

6.44

6.44/0.19=33.89

1,18

Error

3.40

18

0.19

Total

9.84

19

Weknow,,,,.IfweknowthatDFforSST=19,whatisn?

n=20

?=201.71.7+0.840.84509.12-21.70.84100-125.0

?=6.44

SSE=SST-SSR=9.84-6.44=3.40

DF(Degreesoffreedom):demonstration.Note:discountingtheinterceptwhencalculatingSST.

MS=SS/DF

p=0.000[askstudents].Whatdoesthep-valuesay?

V.F-Tests

F-testsaremoregeneralthant-tests,t-testscanbeseenasaspecialcaseofF-tests.IfyouhavedifficultywithF-tests,pleaseaskyourGSIstoreviewF-testsinthelab.F-teststakestheformofafractionoftwoMS's.

AnFstatistichastwodegree(cuò)soffreedomassociatedwithit:thedegreeoffreedominthenumerator,andthedegreeoffreedominthedenominator.

AnFstatisticisusuallylargerthan1.TheinterpretationofanFstatisticsisthatwhethertheexplainedvariancebythealternativehypothesisisduetochance.Inotherwords,thenullhypothesisisthattheexplainedvarianceisduetochance,orallthecoefficientsarezero.

ThelargeranF-statistic,themorelikelythat(yī)thenullhypothesisisnottrue.Thereisat(yī)ableinthebackofyourbookfromwhichyoucanfindexactprobabilityvalues.

Inourexample,theFis34,whichishighlysignificant.

VI.R2

R2=SSR/SST

Theproportionofvarianceexplainedbythemodel.

Inourexample,

R-sq=65.4%

VII.Whathappensifweincreasemoreindependentvariables.

1.SSTstaysthesame.

2.SSRalwaysincreases.

3.SSEalwaysdecreases.

4.R2alwaysincreases.

5.MSRusuallyincreases.

6.MSEusuallydecreases.

7.F-testusuallyincreases.

Exceptionsto5and7:irrelevantvariablesmaynotexplainthevariancebuttakeupdegreesoffreedom.Wereallyneedtolookat(yī)theresults.

VIII.Important:GeneralWaysofHypothesisTestingwithF-Statistics.

AlltestsinlinearregressioncanbeperformedwithF-teststatistics.Thetrickistorun"nestedmodels."

Twomodelsarenestediftheindependentvariablesinonemodelareasubsetorlinearcombinat(yī)ionsofasubset(子集)oftheindependentvariablesintheothermodel.

Thatistosay.IfmodelAhasindependentvariables(1,,),andmodelBhasindependentvariables(1,,,),AandBarenested.Aiscalledtherestrictedmodel;Biscalledlessrestrictedorunrestrictedmodel.WecallArestrictedbecauseAimpliesthat.Thisisarestriction.

Anotherexample:Chasindependentvariable(1,,+),Dhas(1,+).

CandAarenotnested.

CandBarenested.OnerestrictioninC:.

CandDarenested.OnerestrictioninD:.

DandAarenotnested.

DandBarenested:tworestrictioninD:;.

Wecanalwaystesthypothesesimpliedintherestrictedmodels.Steps:runtworegressionforeachhypothesis,onefortherestrictedmodelandonefortheunrestrictedmodel.TheSSTshouldbethesameacrossthetwomodels.WhatisdifferentisSSEandSSR.Thatis,whatisdifferentisR2.Let

;

Usethefollowingformulas:

?

or

(proof:useSST=SSE+SSR)

Note,df(SSEr)-df(SSEu)=df(SSRu)-df(SSRr)=,

isthenumberofconstraints(notnumberofparameters)impliedbytherestrictedmodel

or

Notethat

Thatis,for1dftests,youcaneitherdoanF-testorat-test.Theyyieldthesameresult.Anotherwaytolookatitisthat(yī)thet-testisaspecialcaseoftheFtest,withthenumeratorDFbeing1.

IX.AssumptionsofF-tests

Whatassumptionsdowenee(cuò)dtomakeanANOVAtablework?

Notmuchanassumption.Allweneedistheassumptionthat(X'X)isnotsingular,sothattheleastsquareestimatebexists.

Theassumptionof=0isnee(cuò)dedifyouwanttheANOVAtabletobeanunbiasedestimat(yī)eofthetrueANOVA(equation1)inthepopulation.Reason:wewantbtobeanunbiasedestimatorof,andthecovariancebetweenbandtodisappear.

ForreasonsIdiscussedearlier,theassumptionsofhomoscedasticityandnon-serialcorrelationarenecessaryforthee(cuò)stimat(yī)ionof.

Thenormalityassumptionthatiisdistributedinanormaldistributionisneededforsmallsamples.

X.TheConceptofIncrement

Everytimeyouputonemoreindependentvariableintoyourmodel,yougetanincreasein.Wesometimecalledtheincrease"incremental."Whatismeansisthatmorevarianceisexplained,orSSRisincreased,SSEisreduced.Whatyoushouldunderstandisthattheincrementalat(yī)tributedtoavariableisalwayssmallerthanthewhenothervariablesareabsent.

?XI.ConsequencesofOmittingRelevantIndependentVariables

Saythetruemodelisthefollowing:

.

Butforsomereasonweonlycollectorconsiderdataon.Therefore,weomitintheregression.Thatis,weomitinourmodel.Webrieflydiscussedthisproblembefore.Theshortstoryisthatwearelikelytohaveabiasduetotheomissionofarelevantvariableinthemodel.Thisissoeventhoughourprimaryinterestistoestimatetheeffectoforony.

Why?Wewillhaveaformalpresentationofthisproblem.

XII.MeasuresofGoodness-of-Fit

Therearedifferentwaystoassessthegoodness-of-fitofamodel.

A.R2

R2isaheuristicmeasurefortheoverallgoodness-of-fit.Itdoesnothaveanassociatedteststat(yī)istic.

R2measurestheproportionofthevarianceinthedependentvariablethat(yī)is“explained”bythemodel:

R2=

B.ModelF-test

ThemodelF-testteststhejointhypothesesthat(yī)allthemodelcoefficientsexceptfortheconstanttermarezero.

DegreesoffreedomsassociatedwiththemodelF-test:

Numerator:p-1

Denominator:n-p.

C.t-testsforindividualparameters

At-testforanindividualparameterteststhehypothesisthat(yī)aparticularcoefficientisequaltoaparticularnumber(commonlyzero).

tk=(bk-k0)/SEk,whereSEkisthe(k,k)elementofMSE(X’X)-1,withdegree(cuò)offreedom=n-p.

D.IncrementalR2

Relativetoarestrictedmodel,thegaininR2fortheunrestrictedmodel:

R2=Ru2-Rr2

?E.F-testsforNestedModel

ItisthemostgeneralformofF-testsandt-tests.

?

Itisequaltoat(yī)-testiftheunrestrictedandrestrictedmodelsdifferonlybyonesingleparameter.

ItisequaltothemodelF-testifwesettherestrictedmodeltotheconstant-onlymodel.

[Askstudents]WhatareSST,SSE,andSSR,andtheirassociateddegreesoffreedom,fortheconstant-onlymodel?

NumericalExample

Asociologicalstudyisinterestedinunderstandingthesocialdeterminantsofmat(yī)hematicalachievementamonghighschoolstudents.Youarenowaskedtoansweraseriesofquestions.Thedataarerealbuthavebeentailoredforeducat(yī)ionalpurposes.Thetotalnumberofobservationsis400.Thevariablesaredefinedas:

y:mathscore

x1:fat(yī)her'seducation

x2:mother'seducation

x3:family'ssocioeconomicstatus

x4:numberofsiblings

x5:classrank

x6:parents'totaleducation(note:x6

=

x1

x2)

Forthefollowingregressionmodels,weknow:

Table1

SST?SSR?SSE?DFR2

(1)yon(1x1x2x3x4)?34863?4201

(2)yon(1x6x3x4)?34863???396 .1065

(3)yon(1x6x3x4x5)?34863?10426?24437?395?.2991

(4)x5on(1x6x3x4)???269753 396?.0210

1.PleasefillthemissingcellsinTable1.

2.Testthehypothesisthattheeffectsoffather'seducat(yī)ion(x1)andmother'seducation(x2)onmathscorearethesameaftercontrollingforx3andx4.

3.Testthehypothesisthatx6,x3andx4inModel(2)allhaveazeroeffectony.

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論