優(yōu)秀大學(xué)統(tǒng)計學(xué)授課講義_第1頁
優(yōu)秀大學(xué)統(tǒng)計學(xué)授課講義_第2頁
優(yōu)秀大學(xué)統(tǒng)計學(xué)授課講義_第3頁
優(yōu)秀大學(xué)統(tǒng)計學(xué)授課講義_第4頁
優(yōu)秀大學(xué)統(tǒng)計學(xué)授課講義_第5頁
已閱讀5頁,還剩84頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

QuantitativeDataAnalysis:

Statistics第一頁,共89頁。SherlockHolmes"...whilemanisaninsolublepuzzle,intheaggregatehebecomesamathematicalcertainty.Youcan,forexample,neverforetellwhatanyonemanwilldo,butyoucansaywithprecisionwhatanaveragenumberwillbeupto.Individualsvary,butpercentagesremainconstant.Sosaysthestatistician"

第二頁,共89頁。OverviewGeneralStatisticsTheNormalDistributionZ-TestsConfidenceIntervalsT-Tests第三頁,共89頁。GeneralStatistics

~THEGOLDENRULE~StatisticsNEVERreplacethejudgmentoftheexpert.第四頁,共89頁。ApproachtoStatisticalResearchFormulateaHypothesisStatepredictionsofthehypothesisPerformexperimentsorobservationsInterpretexperimentsorobservationsEvaluateresultswithrespecttohypothesisRefinehypothesisandstartagain(Basicallythesameasallotherresearch)第五頁,共89頁。HypothesisTestingH0:

NullHypothesis,statusquoHA:AlternativeHypothesis,researchquestionSo,either:"ThedatadoesnotsupportH0"or"WefailtorejectH0"第六頁,共89頁。TypesofDataContinuousheight,age,timeDiscrete#ofdaysworkedthisweek,#leavesonatreeOrdinal{Good,O.K.,Bad}Nominal{Yes/No},{Teacher/Chemist/Haberdasher}第七頁,共89頁。PicturingTheData

第八頁,共89頁。PieChartsNominal/OrdinalOnlysuitablefordatathataddsupto1Hardtocomparevaluesinthechart第九頁,共89頁。BarChartsNominal/OrdinalEasiertocomparevaluesthanpiechartSuitableforawiderrangeofdata第十頁,共89頁。DotPlotsNominal/Ordinal

RepresentsallthedataDifficulttoread第十一頁,共89頁。BoxPlotsNominal/Ordinal1IQR,3IQROutliers第十二頁,共89頁。ScatterPlotsExcellentforexaminingassociationbetweentwovariables第十三頁,共89頁。HistogramsContinuousDataDivideDataintoranges第十四頁,共89頁。Time-SeriesPlotsTimerelatedDatae.g.StockPrices第十五頁,共89頁。Question1Inatelephonesurveyof68households,whenaskeddotheyhavepets,thefollowingweretheresponses:16:NoPets28:Dogs32:CatsDrawtheappropriategraphictoillustratetheresults!!第十六頁,共89頁。Question1-SolutionTotalnumbersurveyed=68Numberwithnopets=16=>Totalwithpets=(68-16)=52Buttotal28dogs+32cats=60=>Sosomepeoplehavebothcatsanddogs第十七頁,共89頁。Question1-SolutionHowmany?Itmustbe(60-52)=8peopleNopets=16Dogs=20Cats=24Both=8-------------------------Total=68第十八頁,共89頁。Question1-SolutionGraphic:PieChartorBarChart第十九頁,共89頁。TheLiteraryDigestPoll1936USPresidentialElectionAlfLandon(R)vs.FranklinD.Roosevelt(D)第二十頁,共89頁。TheLiteraryDigestPollLiteraryDigesthadbeenconductingsuccessfulpresidentialelectionpollssince1916Theyhadcorrectlypredictedtheoutcomesofthe1916,1920,1924,1928,and1932electionsbyconductingpolls.Thesepollswerealucrativeventureforthemagazine:readerslikedthem;newspapersplayedthemup;andeach“ballot”includedasubscriptionblank.第二十一頁,共89頁。TheLiteraryDigestPollTheysentout10millionballotstotwogroupsofpeople:prospectivesubscribers,“whowerechieflyupper-andmiddle-incomepeople”alistdesignedto"correctforbias"fromthefirstlist,consistingofnamesselectedfromtelephonebooksandmotorvehicleregistries第二十二頁,共89頁。TheLiteraryDigestPollResponserate:approximately25%,or2,376,523responsesResult:Landoninalandslide(predicted57%ofthevote,Rooseveltpredicted40%)Electionresult:Rooseveltreceivedapproximately60%ofthevote第二十三頁,共89頁。TheLiteraryDigestPollPOSSIBLECAUSESOFERRORSelectionBias:Bytakingnamesandaddressesfromtelephonedirectories,surveysystematicallyexcludedpoorvoters.Republicansweremarkedlyoverrepresentedin1936,Democratsdidnothaveasmanyphones,

notaslikelytodrivecars,anddidnotreadtheLiteraryDigest“SamplingFrame”istheactualpopulationofindividualsfromwhichasampleisdrawn:Selectionbiasresultswhensamplingframeisnotrepresentativeofthepopulationofinterest第二十四頁,共89頁。TheLiteraryDigestPollPOSSIBLECAUSESOFERRORNon-responseBias:Becauseonly20%of10millionpeoplereturnedsurveys,non-respondentsmayhavedifferentpreferencesfromrespondentsIndeed,respondentsfavoredLandonGreaterresponseratesreducetheoddsofbiasedsamples第二十五頁,共89頁。TerminologyPopulation:isasetofentitiesconcerningwhichstatisticalinferencesaretobedrawn.Sample:anumberofindependentobservationsfromthesameprobabilitydistributionParameter:thedistributionofarandomvariableasbelongingtoafamilyofprobabilitydistributions,distinguishedfromeachotherbythevaluesofafinitenumberofparametersBias:afactorthatcausesastatisticalsampleofapopulationtohavesomeexamplesofthepopulationlessrepresentedthanothers.第二十六頁,共89頁。Outliers(andtheirtreatment)An"outlier"isanobservationthatdoesnotfitthepatternintherestofthedataCheckthedataCheckwiththemeasurerIfreasontobelieveitisNOTreal,changeitifpossible,otherwiseleaveitout(butnote).Ifreasontobelieveitisreal,leaveitoutandnote.第二十七頁,共89頁。TheMeanTheMean(Arithmetic)Themeanisdefinedasthesumofalltheelements,dividedbythenumberofelements.Thestatisticalmeanofasetofobservationsistheaverageofthemeasurementsinasetofdata第二十八頁,共89頁。TheVarianceButtherecanbealotofvarianceinindividualelements,e.g.teachersalariesAverage=€22,000Lowest=€12,000Difference=12,000-22,000=-10,000第二十九頁,共89頁。TheVarianceSumof(Sample-Average)=0,thusweneedtodefinevariance.Thevarianceofasetofdataisacumulativemeasureofthesquaresofthedifferenceofallthedatavaluesfromthemeandividedbysamplesizeminusone.第三十頁,共89頁。StandardDeviationThestandarddeviationofasetofdataisthepositivesquarerootofthevariance.-1-1第三十一頁,共89頁。Question2Findthemeanandvarianceofthefollowingsamplevalues:36,41,43,44,46第三十二頁,共89頁。Question2Mean:(36+41+43+44+46)/5=42Variance

DifferenceSquare36–42=-63641–42=-1143–42=1144–42=2446–42=416----------------------------------------5858/(5-1)=58/4=14.5第三十三頁,共89頁。TheNormalDistribution第三十四頁,共89頁。第三十五頁,共89頁。DensityCurves:Properties第三十六頁,共89頁。TheNormalDistributionThegraphhasasinglepeakatthecenter,thispeakoccursatthemeanThegraphissymmetricalaboutthemeanThegraphnevertouchesthehorizontalaxisTheareaunderthegraphisequalto1第三十七頁,共89頁。CharacterizationAnormaldistributionisbell-shapedandsymmetric.Thedistributionisdeterminedbythemeanmu,m,andthestandarddeviationsigma,s.Themeanmucontrolsthecenterandsigmacontrolsthespread.第三十八頁,共89頁。第三十九頁,共89頁。第四十頁,共89頁。第四十一頁,共89頁。第四十二頁,共89頁。第四十三頁,共89頁。第四十四頁,共89頁。第四十五頁,共89頁。第四十六頁,共89頁。第四十七頁,共89頁。TheNormalDistributionIfavariableisnormallydistributed,then:withinonestandarddeviationofthemeantherewillbeapproximately68%ofthedatawithintwostandarddeviationsofthemeantherewillbeapproximately95%ofthedatawithinthreestandarddeviationsofthemeantherewillbeapproximately99.7%ofthedata第四十八頁,共89頁。TheNormalDistribution第四十九頁,共89頁。Why?Onereasonthenormaldistributionisimportantisthatmanypsychologicalandorgansationalvariablesaredistributedapproximatelynormally.Measuresofreadingability,introversion,jobsatisfaction,andmemoryareamongthemanypsychologicalvariablesapproximatelynormallydistributed.Althoughthedistributionsareonlyapproximatelynormal,theyareusuallyquiteclose.第五十頁,共89頁。Why?Asecondreasonthenormaldistributionissoimportantisthatitiseasyformathematicalstatisticianstoworkwith.Thismeansthatmanykindsofstatisticaltestscanbederivedfornormaldistributions.Almostallstatisticaltestsdiscussedinthistextassumenormaldistributions.Fortunately,thesetestsworkverywellevenifthedistributionisonlyapproximatelynormallydistributed.Sometestsworkwellevenwithverywidedeviationsfromnormality.第五十一頁,共89頁。OneTail/TwoTailImagineweundertookanexperimentwherewemeasuredstaffproductivitybeforeandafterweintroducedacomputersystemtohelprecordsolutionstocommonissuesofworkAverageproductivitybefore=6.4Averageproductivityafter=9.2第五十二頁,共89頁。OneTail/TwoTailBefore=6.4After=9.2010第五十三頁,共89頁。OneTail/TwoTailIsthisasignificantdifference?Before=6.4After=9.2100第五十四頁,共89頁。OneTail/TwoTailorisitmorelikelyasamplingvariation?Before=6.4After=9.2100第五十五頁,共89頁。OneTail/TwoTailBefore=6.4After=9.2100第五十六頁,共89頁。OneTail/TwoTailBefore=6.4After=9.2100第五十七頁,共89頁。OneTail/TwoTailBefore=6.4After=9.2Howmanystandarddevaitionsfromthemeanisthis?100第五十八頁,共89頁。OneTail/TwoTailBefore=6.4After=9.2Howmanystandarddevaitionsfromthemeanisthis?100andisitstatisticallysignificant?第五十九頁,共89頁。OneTail/TwoTailBefore=6.4After=9.2100σσσ第六十頁,共89頁。OneTail/TwoTailOne-TailedH0:m1>=m2HA:m1<m2Two-TailedH0:m1=m2HA:m1<>m2第六十一頁,共89頁。STANDARDNORMALDISTRIBUTIONNormalDistributionisdefinedasN(mean,(Stddev)^2)StandardNormalDistributionisdefinedasN(0,(1)^2)第六十二頁,共89頁。STANDARDNORMALDISTRIBUTIONUsingthefollowingformula:willconvertanormaltableintoastandardnormaltable.第六十三頁,共89頁。ExerciseIftheaverageIQinagivenpopulationis100,andthestandarddeviationis15,whatpercentageofthepopulationhasanIQof145orhigher?第六十四頁,共89頁。AnswerP(X>=145)P(Z>=((145-100)/15))P(Z>=3)Fromtables:99.87%arelessthan3=>0.13%ofpopulation第六十五頁,共89頁。TrendsinStatisticalTestsusedinResearchPapersHistoricallyCurrentlyTestingEstimationHypothesisTestsQuotingP-ValuesConfidenceIntervalsResultsin:Accept/RejectResultsin:p-ValueResultsin:Approx.Mean第六十六頁,共89頁。ConfidenceIntervals

Aconfidenceintervalisusedtoexpresstheuncertaintyinaquantitybeingestimated.Thereisuncertaintybecauseinferencesarebasedonarandomsampleoffinitesizefromapopulationorprocessofinterest.Tojudgethestatisticalprocedurewecanaskwhatwouldhappenifweweretorepeatthesamestudy,overandover,gettingdifferentdata(andthusdifferentconfidenceintervals)eachtime.第六十七頁,共89頁。ConfidenceIntervals

Ifweknowthetruepopulationmeanandsamplenindividuals,weknowthatifthedataisnormallydistributed,Averagemeanofthesensampleshasa95%chanceoffallingintotheinterval第六十八頁,共89頁。ConfidenceIntervals

wherethestandarderrorfora95%CImaybecalculatedasfollows;第六十九頁,共89頁。Example1第七十頁,共89頁。Example1DoesFF-PD-GhavemoreofthepopularvotethanFG-L?Inarandomsampleof721respondents:382FF-PD-G339FG-LCanweconcludethatFF-PD-Ghasmorethan50%ofthepopularvote?第七十一頁,共89頁。Example1-SolutionSampleproportion=p=382/721=0.53Samplesize=n=721StandardError=(SqRt((p(1-p)/n)))=0.0295%ConfidenceInterval0.53+/-1.96(0.02)0.53+/-0.04[0.49,0.57]Thus,wecannotconcludethatFF-PD-Ghadmoreofthepopularvote,sincethisintervalspans50%.So,wesay:"thedataareconsistentwiththehypothesisthatthereisnodifference"

第七十二頁,共89頁。Example2第七十三頁,共89頁。Example2DidObamahavemoreofthepopularvotethanMcCain?Inarandomsampleof1000respondents532Obama468McCainCanweconcludethatObamahadmorethan50%ofthepopularvote?第七十四頁,共89頁。Example2–95%CISampleproportion=p=532/1000=0.532Samplesize=n=1000StandardError=(SqRt((p(1-p)/n)))=0.01695%ConfidenceInterval0.532+/-1.96(0.016)0.532+/-0.03136[0.5006,0.56336]Thus,wecanconcludethatObamahadmoreofthepopularvote,sincethisintervaldoesnotspan50%.So,wesay:"thedataareconsistentwiththehypothesisthatthereisadifferenceina95%CI"

第七十五頁,共89頁。Example2–99%CISampleproportion=p=532/1000=0.532Samplesize=n=1000StandardError=(SqRt((p(1-p)/n)))=0.01699%ConfidenceInterval0.532+/-2.58(0.016)0.532+/-0.041[0.491,0.573]Thus,wecannotconcludethatObamahadmoreofthepopularvote,sincethisintervaldoesspan50%.So,wesay:"thedataareconsistentwiththehypothesisthatthereisnodifferenceina99%CI"

第七十六頁,共89頁。Example2–99.99%CISampleproportion=p=532/1000=0.532Samplesize=n=1000StandardError=(SqRt((p(1-p)/n)))=0.01699.99%ConfidenceInterval0.532+/-3.87(0.016)0.532+/-0.06[0.472,0.592]Thus,wecannotconcludethatObamahadmoreofthepopularvote,sincethisintervaldoesspan50%.So,wesay:"thedataareconsistentwiththehypothesisthatthereisnodifferenceina99.99%CI"

第七十七頁,共89頁。T-Tests

第七十八頁,共89頁。OneTail/TwoTailT-testZ-test第七十九頁,共89頁。T-Testspowerfulparametrictestforcalculatingthesignificanceofasmallsamplemeannecessaryforsmallsamplesbecausetheirdistributionsarenotnormalonefirsthastocalculatethe"degreesoffreedom"第八十頁,共89頁。T-TestsThet-testisoftencalledtheStudent'st-test.ItwascreatedbyachiefbrewernamedWilliamS.GossettwhoworkedfortheGuinnessBrewery.Hediscoveredthisstatisticaspartofhisworkinthebrewerytocomparethedifferentbrewingprocessesforchangingrawmaterialsintobeer.GuinnessdidnotallowitsemployeestopublishresultsbutthemanagementdecidedtoallowGossetttopublishitunderapseudonym-Student.Hencewehaveth

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論