基本統(tǒng)計(jì)-(3)

上傳人：5*** IP屬地：湖北上傳時(shí)間：2021-11-15 格式：PPT 頁(yè)數(shù)：64 大?。?.32MB 積分：30 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩59頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、第第3 3講講常見(jiàn)統(tǒng)計(jì)分析常見(jiàn)統(tǒng)計(jì)分析內(nèi)容提要內(nèi)容提要描述統(tǒng)計(jì)描述統(tǒng)計(jì) 頻數(shù)表分析頻數(shù)表分析方差分析方差分析 t檢驗(yàn)檢驗(yàn) 卡方檢驗(yàn)卡方檢驗(yàn) 線性回歸線性回歸相關(guān)分析相關(guān)分析3.1 描述分析描述分析(Descriptive statistics)描述統(tǒng)計(jì)就是把數(shù)據(jù)集所包含的信息加以簡(jiǎn)要地概況，描述統(tǒng)計(jì)就是把數(shù)據(jù)集所包含的信息加以簡(jiǎn)要地概況，如計(jì)算數(shù)據(jù)的數(shù)字特征、制作頻數(shù)表和頻數(shù)圖等等，如計(jì)算數(shù)據(jù)的數(shù)字特征、制作頻數(shù)表和頻數(shù)圖等等，用所獲得的統(tǒng)計(jì)量和圖表來(lái)描述數(shù)據(jù)集所反映的特用所獲得的統(tǒng)計(jì)量和圖表來(lái)描述數(shù)據(jù)集所反映的特征和規(guī)律，使得研究的問(wèn)題更加簡(jiǎn)單、直觀。征和規(guī)律，使得研究的問(wèn)題更加簡(jiǎn)單

2、、直觀。描述性統(tǒng)計(jì)主要包括反映數(shù)據(jù)集中趨勢(shì)的特征值描述性統(tǒng)計(jì)主要包括反映數(shù)據(jù)集中趨勢(shì)的特征值( (比如比如平均數(shù)、中位數(shù)、眾數(shù)、分位數(shù)平均數(shù)、中位數(shù)、眾數(shù)、分位數(shù)) )、數(shù)據(jù)離散程度的、數(shù)據(jù)離散程度的特征值特征值( (比如方差、標(biāo)準(zhǔn)差、值域、變異系數(shù)比如方差、標(biāo)準(zhǔn)差、值域、變異系數(shù)) )和數(shù)和數(shù)據(jù)分布形態(tài)的特征值據(jù)分布形態(tài)的特征值( (比如偏度、峰度比如偏度、峰度) )。 3.1.1使用使用summary()函數(shù)函數(shù)oats0 比正態(tài)分布的高峰更加陡峭尖頂 Kurtosis0 正偏差數(shù)值較大，為正偏或右偏。長(zhǎng)尾巴拖在右邊 Skewness describe(oats$yield) descr

3、ibe(oats$yield) var n mean sd median trimmed mad min max range skew kurtosis se var n mean sd median trimmed mad min max range skew kurtosis se1 1 72 104 27 102 103 27 53 174 121 0.26 -0.46 3.21 1 72 104 27 102 103 27 53 174 121 0.26 -0.46 3.23.2 頻數(shù)表頻數(shù)表(Frequency table)分析分析頻數(shù)表分析是對(duì)數(shù)據(jù)集按數(shù)據(jù)范圍分成若干區(qū)間，頻數(shù)表

4、分析是對(duì)數(shù)據(jù)集按數(shù)據(jù)范圍分成若干區(qū)間，即分成若干組，求出每組組中值，各組數(shù)據(jù)用組中值代即分成若干組，求出每組組中值，各組數(shù)據(jù)用組中值代替，計(jì)算各組數(shù)據(jù)的頻數(shù)，并作出頻數(shù)表。替，計(jì)算各組數(shù)據(jù)的頻數(shù)，并作出頻數(shù)表。頻數(shù)表分析例子頻數(shù)表分析例子summary(oats$yield)# 計(jì)算頻數(shù)計(jì)算頻數(shù)A round(prop.table(A) round(prop.table(A) * * 100,2) # 100,2) # 計(jì)算頻數(shù)比例計(jì)算頻數(shù)比例 (40,60 (60,80 (80,100 (100,120 (120,140 (140,160 (160,180 (40,60 (60,80 (8

5、0,100 (100,120 (120,140 (140,160 (160,180 2.8 16.7 29.2 23.6 18.1 6.9 2.8 2.8 16.7 29.2 23.6 18.1 6.9 2.8Frequency chart of yieldyeildFrequency4060801001201401601800510 15 203.3 方差分析方差分析ANOVA方差分析是一種在若干組能相互比較的試驗(yàn)數(shù)據(jù)中，把方差分析是一種在若干組能相互比較的試驗(yàn)數(shù)據(jù)中，把產(chǎn)生變異的原因加以區(qū)分的方法與技術(shù)，其主要用途是產(chǎn)生變異的原因加以區(qū)分的方法與技術(shù)，其主要用途是研究外界因素或試驗(yàn)條件的改

6、變對(duì)試驗(yàn)結(jié)果影響是否顯研究外界因素或試驗(yàn)條件的改變對(duì)試驗(yàn)結(jié)果影響是否顯著。著。類型：?jiǎn)我蛩胤讲罘治鲱愋停簡(jiǎn)我蛩胤讲罘治?One-way ANOVA)、雙因素方差、雙因素方差分析分析(Two-way ANOVA)或多元方差分析或多元方差分析(MANOVA)。方差分析的基本模型是線性模型，并假設(shè)隨機(jī)變量是獨(dú)方差分析的基本模型是線性模型，并假設(shè)隨機(jī)變量是獨(dú)立、正態(tài)和等方差的。立、正態(tài)和等方差的。方差分析是根據(jù)平方和的加和原理，利用方差分析是根據(jù)平方和的加和原理，利用 F 檢驗(yàn)，進(jìn)而檢驗(yàn)，進(jìn)而判斷試驗(yàn)因素對(duì)試驗(yàn)結(jié)果的影響是否顯著。判斷試驗(yàn)因素對(duì)試驗(yàn)結(jié)果的影響是否顯著。3.3.1 單因素方差分析單因素方

7、差分析# # 建立數(shù)據(jù)集建立數(shù)據(jù)集dfdf # #yield-scan()yield-scan()24 30 28 2624 30 28 2627 24 21 2627 24 21 2631 28 25 3031 28 25 3032 33 33 2832 33 33 2821 22 16 2121 22 16 21Treat - rep(paste(A, 1:5, Treat - rep(paste(A, 1:5, sepsep = ), rep(4, 5) = ), rep(4, 5)dfdf - - data.framedata.frame( Treat, yield)( Treat,

8、yield)# # 方差分析方差分析 #fit - fit TukeyHSDTukeyHSD(fit)(fit) TukeyTukey multiple comparisons of means multiple comparisons of means 95% family-wise confidence level 95% family-wise confidence levelFit: Fit: aovaov(formula = yield (formula = yield Treat, data = Treat, data = dfdf) )$Treat$Treat diff diff

9、 lwrlwr uprupr p p adjadjA2-A1 -2.5 -8.2 3.2 0.66A2-A1 -2.5 -8.2 3.2 0.66A3-A1 1.5 -4.2 7.2 0.92A3-A1 1.5 -4.2 7.2 0.92A4-A1 4.5 -1.2 10.2 0.15A4-A1 4.5 -1.2 10.2 0.15A5-A1 -7.0 -12.7 -1.3 0.01A5-A1 -7.0 -12.7 -1.3 0.01A3-A2 4.0 -1.7 9.7 0.24A3-A2 4.0 -1.7 9.7 0.24A4-A2 7.0 1.3 12.7 0.01A4-A2 7.0 1.

10、3 12.7 0.01A5-A2 -4.5 -10.2 1.2 0.15A5-A2 -4.5 -10.2 1.2 0.15A4-A3 3.0 -2.7 8.7 0.50A4-A3 3.0 -2.7 8.7 0.50A5-A3 -8.5 -14.2 -2.8 0.00A5-A3 -8.5 -14.2 -2.8 0.00A5-A4 -11.5 -17.2 -5.8 0.00A5-A4 -11.5 -17.2 -5.8 0.00-15-10-50510A5-A4A5-A3A4-A3A5-A2A4-A2A3-A2A5-A1A4-A1A3-A1A2-A195% family-wise confidenc

11、e levelDifferences in mean levels of Treat print(duncan.test(fit,Treat,alpha=0.05) print(duncan.test(fit,Treat,alpha=0.05)$statistics$statistics Mean CV MSerror Mean CV MSerror 26 9.9 6.7 26 9.9 6.7$means$means yield std.err r Min. Max. yield std.err r Min. Max.A1 27 1.3 4 24 30A1 27 1.3 4 24 30A2 2

12、4 1.3 4 21 27A2 24 1.3 4 21 27A3 28 1.3 4 25 31A3 28 1.3 4 25 31A4 32 1.3 4 28 33A4 32 1.3 4 28 33A5 20 1.3 4 16 22A5 20 1.3 4 16 22$groups$groups trt means M trt means M1 A4 32 a1 A4 32 a2 A3 28 ab2 A3 28 ab3 A1 27 b3 A1 27 b4 A2 24 b4 A2 24 b5 A5 20 c5 A5 20 c數(shù)據(jù)正態(tài)性、等方差的檢驗(yàn)數(shù)據(jù)正態(tài)性、等方差的檢驗(yàn)#數(shù)據(jù)正態(tài)性檢驗(yàn)數(shù)據(jù)正態(tài)性檢

13、驗(yàn)library( car )fit.2 shapiro.test(resid(lm(yield Treat, data = df ) Shapiro-Wilk normality testdata: resid(lm(yield Treat, data = df)W = 0.87, p-value = 0.0126 bartlett.test(yield Treat, data = df)Bartlett test of homogeneity of variancesdata: yield by TreatBartletts K-squared = 0.051, df = 4, p-val

14、ue = 0.99973.3.2 雙因素方差分析雙因素方差分析 df-read.csv(file=d4.3.2.csv, header=T) #讀入數(shù)據(jù)讀入數(shù)據(jù)library(reshape)df.2-melt(df, id=c(A) #進(jìn)行數(shù)據(jù)重構(gòu)進(jìn)行數(shù)據(jù)重構(gòu)colnames(df.2)2:3-c(B, yield) #變量重命名變量重命名# 方差分析方差分析 #fit (duncan.test(fit, A, alpha = 0.05) (duncan.test(fit, A, alpha = 0.05)$means$means yield std.err r Min. Max. yiel

15、d std.err r Min. Max.A1 74 2.3 3 71 77A1 74 2.3 3 71 77A2 91 2.3 3 90 92A2 91 2.3 3 90 92A3 70 2.3 3 59 80A3 70 2.3 3 59 80A4 79 2.3 3 75 82A4 79 2.3 3 75 82A5 64 2.3 3 60 67A5 64 2.3 3 60 67A6 84 2.3 3 82 86A6 84 2.3 3 82 86$groups$groups trt means M trt means M1 A2 91 a1 A2 91 a2 A6 84 ab2 A6 84 a

16、b3 A4 79 bc3 A4 79 bc4 A1 74 cd4 A1 74 cd5 A3 70 de5 A3 70 de6 A5 64 e6 A5 64 e (duncan.test(fit, B, alpha = 0.05) (duncan.test(fit, B, alpha = 0.05)$means$means yield std.err r Min. Max. yield std.err r Min. Max.B1 74 1.6 6 59 90B1 74 1.6 6 59 90B2 76 1.6 6 60 90B2 76 1.6 6 60 90B3 80 1.6 6 67 92B3

17、 80 1.6 6 67 92$groups$groups trt means M trt means M1 B3 80 a1 B3 80 a2 B2 76 ab2 B2 76 ab3 B1 74 b3 B1 74 bdf-read.csv(file=d4.3.3.csv, header=T)df.2-melt(df,id=c(A)colnames(df.2)2:3-c(B,y)fit duncan.testduncan.test( (fit,Afit,A, alpha=0.05)$groups, alpha=0.05)$groups trttrt means M means M1 A3 33

18、 a1 A3 33 a2 A2 30 b2 A2 30 b3 A1 28 c3 A1 28 c duncan.testduncan.test( fit, B, alpha=0.05)$groups( fit, B, alpha=0.05)$groups trttrt means M means M1 b4 32 a1 b4 32 a2 b3 32 a2 b3 32 a3 b1 30 b3 b1 30 b4 b2 29 c4 b2 29 c5 5b5 28 db5 28 d with(df.2,duncan.test(y,A:B,DFerror=45,MSerror=1.22)$groups)

19、with(df.2,duncan.test(y,A:B,DFerror=45,MSerror=1.22)$groups) trt means M trt means M1 A3:b3 35 a1 A3:b3 35 a2 A3:b4 34 ab2 A3:b4 34 ab3 A3:b2 34 ab3 A3:b2 34 ab4 A3:b1 33 b4 A3:b1 33 b5 A2:b4 33 b5 A2:b4 33 b6 A2:b3 31 c6 A2:b3 31 c7 A1:b3 30 cd7 A1:b3 30 cd8 A1:b4 30 cd8 A1:b4 30 cd9 A3:b5 30 cd9 A

20、3:b5 30 cd10 A2:b1 29 de10 A2:b1 29 de11 A2:b5 28 ef11 A2:b5 28 ef12 A1:b1 27 fg12 A1:b1 27 fg13 A2:b2 26 fgh13 A2:b2 26 fgh14 A1:b2 26 gh14 A1:b2 26 gh15 A1:b5 25 h15 A1:b5 25 h交互作用圖形交互作用圖形library(HH)interaction2wt( y A * B, data = df.2)y: main effects and 2-way interactionsA1 A2 A3A Ay A | Ab1b2b3

21、b4b5B B24262830323436y yy B | Ay A | B24262830323436y yy B | BBb1b2b3b4b5AA1A2A33.4協(xié)方差分析協(xié)方差分析(analysis of covariance) 協(xié)方差分析是關(guān)于如何調(diào)節(jié)協(xié)變量對(duì)因變量的影響協(xié)方差分析是關(guān)于如何調(diào)節(jié)協(xié)變量對(duì)因變量的影響效應(yīng)，從而更加有效地分析實(shí)驗(yàn)處理效應(yīng)的一種統(tǒng)計(jì)技效應(yīng)，從而更加有效地分析實(shí)驗(yàn)處理效應(yīng)的一種統(tǒng)計(jì)技術(shù)，也是對(duì)實(shí)驗(yàn)進(jìn)行統(tǒng)計(jì)控制的一種綜合方差分析和回術(shù)，也是對(duì)實(shí)驗(yàn)進(jìn)行統(tǒng)計(jì)控制的一種綜合方差分析和回歸分析的方法。歸分析的方法。當(dāng)研究者知道有些協(xié)變量會(huì)影響因變量，卻不能夠當(dāng)研究者知

22、道有些協(xié)變量會(huì)影響因變量，卻不能夠控制和不感興趣時(shí)，則可以在實(shí)驗(yàn)處理前予以觀測(cè)，然控制和不感興趣時(shí)，則可以在實(shí)驗(yàn)處理前予以觀測(cè)，然后在統(tǒng)計(jì)時(shí)運(yùn)用協(xié)方差分析來(lái)處理。將協(xié)變量對(duì)因變量后在統(tǒng)計(jì)時(shí)運(yùn)用協(xié)方差分析來(lái)處理。將協(xié)變量對(duì)因變量的影響從自變量中分離出去，可以進(jìn)一步提高實(shí)驗(yàn)精確的影響從自變量中分離出去，可以進(jìn)一步提高實(shí)驗(yàn)精確度和統(tǒng)計(jì)檢驗(yàn)靈敏度。例如林木生長(zhǎng)量與肥料的關(guān)系，度和統(tǒng)計(jì)檢驗(yàn)靈敏度。例如林木生長(zhǎng)量與肥料的關(guān)系，施肥條件可以人工控制，但林木初始苗高施肥條件可以人工控制，但林木初始苗高(協(xié)變量協(xié)變量)是難是難以控制的，通過(guò)協(xié)方差分析，消除初始苗高的影響，使以控制的，通過(guò)協(xié)方差分析，消除初始苗高

23、的影響，使得生長(zhǎng)量在一致的基礎(chǔ)上進(jìn)行方差分析。得生長(zhǎng)量在一致的基礎(chǔ)上進(jìn)行方差分析。library(lsmeans)df-read.csv(file=d4.4.1.csv)fit-lm(y x+A, data=df)#fit2 summary(fit)Coefficients:Coefficients: Estimate Std. Error t value Pr(|t|) Estimate Std. Error t value Pr(|t|) (Intercept) 0.8516 0.1853 4.59 0.00177 (Intercept) 0.8516 0.1853 4.59 0.0017

24、7 * * * x 0.2226 0.0344 6.47 0.00019 x 0.2226 0.0344 6.47 0.00019 * * * *AA2 0.0110 0.1201 0.09 0.92950 AA2 0.0110 0.1201 0.09 0.92950 AA3 0.6468 0.1582 4.09 0.00349 AA3 0.6468 0.1582 4.09 0.00349 * * * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1Residual standard error: 0.19 on 8 degrees of free

25、domMultiple R-squared: 0.875,Adjusted R-squared: 0.828 F-statistic: 18.6 on 3 and 8 DF, p-value: 0.000578 anova(fit) anova(fit)Analysis of Variance TableAnalysis of Variance TableResponse: yResponse: y Df Sum Sq Mean Sq F value Pr(F) Df Sum Sq Mean Sq F value Pr(F) x 1 1.308 1.308 36.75 0.0003 x 1 1

26、.308 1.308 36.75 0.0003 * * * *A 2 0.677 0.338 9.51 0.0077 A 2 0.677 0.338 9.51 0.0077 * * * Residuals 8 0.285 0.036 Residuals 8 0.285 0.036 -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 lsmeanslsmeans(fit, pairwise (fit, pairwise A) A)$A $A lsmeanslsmeans A A lsmeanlsmean SE SE dfdf lower.CL uppe

27、r.CL lower.CL upper.CL A1 1.9 0.084 8 1.8 2.1 A1 1.9 0.084 8 1.8 2.1 A2 2.0 0.085 8 1.8 2.2 A2 2.0 0.085 8 1.8 2.2 A3 2.6 0.134 8 2.3 2.9 A3 2.6 0.134 8 2.3 2.9$A pairwise differences$A pairwise differences estimate SE estimate SE dfdf t.ratiot.ratio p.valuep.valueA1 - A2 -0.011 0.12 8 -0.091 0.9954

28、A1 - A2 -0.011 0.12 8 -0.091 0.9954A1 - A3 -0.647 0.16 8 -4.090 0.0087A1 - A3 -0.647 0.16 8 -4.090 0.0087A2 - A3 -0.636 0.16 8 -3.983 0.0100A2 - A3 -0.636 0.16 8 -3.983 0.0100 p values are adjusted using the p values are adjusted using the tukeytukey method for 3 meansmethod for 3 means雙因素協(xié)方差分析雙因素協(xié)方

29、差分析【例子例子 4.4.2】為研究某楊樹(shù)一年生生長(zhǎng)與為研究某楊樹(shù)一年生生長(zhǎng)與 N 肥、肥、K 肥及初始苗高肥及初始苗高的關(guān)系，采用正交試驗(yàn)設(shè)計(jì)，共設(shè)置了的關(guān)系，采用正交試驗(yàn)設(shè)計(jì)，共設(shè)置了 18 個(gè)樣地的栽培試驗(yàn)，試個(gè)樣地的栽培試驗(yàn)，試驗(yàn)因子與水平及測(cè)量結(jié)果如表驗(yàn)因子與水平及測(cè)量結(jié)果如表 4-13所示。試分析所示。試分析 N 肥、肥、K 肥及初肥及初始苗高對(duì)生長(zhǎng)量的影響。始苗高對(duì)生長(zhǎng)量的影響。dfdf-read.csv(file=d4.4.2.csv, header=T)-read.csv(file=d4.4.2.csv, header=T)dfdf,3-,3-as.factoras.fact

30、or( (dfdf,3),3)fit - lm(mass height + N + K, data = fit - lm(mass height + N + K, data = dfdf ) )summary(fit)summary(fit)anovaanova(fit)(fit)lsmeanslsmeans(fit, pairwise N)(fit, pairwise N)lsmeanslsmeans(fit, pairwise K)(fit, pairwise K)$N pairwise differences$N pairwise differences estimate SE esti

31、mate SE dfdf t.ratiot.ratio p.valuep.valuehigh - low 0.097 0.035 13 2.8 0.016 high - low 0.097 0.035 13 2.8 0.016 $K pairwise differences$K pairwise differences estimate SE estimate SE dfdf t.ratiot.ratio p.valuep.value0 - 12.5 -0.071 0.043 13 -1.7 0.258500 - 12.5 -0.071 0.043 13 -1.7 0.258500 - 25

32、-0.308 0.043 13 -7.2 0.000020 - 25 -0.308 0.043 13 -7.2 0.0000212.5 - 25 -0.237 0.043 13 -5.5 0.0002812.5 - 25 -0.237 0.043 13 -5.5 0.000283.5 t檢驗(yàn)檢驗(yàn) 在實(shí)際工作中，經(jīng)常需要判斷兩個(gè)樣本平均數(shù)在實(shí)際工作中，經(jīng)常需要判斷兩個(gè)樣本平均數(shù)是否差異，以了解兩樣本所屬的兩個(gè)總體平均數(shù)是是否差異，以了解兩樣本所屬的兩個(gè)總體平均數(shù)是否相同。檢驗(yàn)方法可以使用否相同。檢驗(yàn)方法可以使用 t 檢驗(yàn)。檢驗(yàn)。對(duì)于兩個(gè)樣本平均數(shù)差異顯著性檢驗(yàn)，可分為對(duì)于兩個(gè)樣本平均數(shù)差異顯著

33、性檢驗(yàn)，可分為非配對(duì)設(shè)計(jì)和配對(duì)設(shè)計(jì)。非配對(duì)設(shè)計(jì)和配對(duì)設(shè)計(jì)。height shapiro.testshapiro.test(height)(height)Shapiro-Shapiro-WilkWilk normality test normality testdata: heightdata: heightW = 0.94, p-value = 0.06344W = 0.94, p-value = 0.06344-2-10127.58.08.59.0Normal Q-Q PlotTheoretical QuantilesSample Quantiles t.test(height, mu =

34、8, alternative = two.sided)One Sample t-testdata: heightt = 4.5, df = 31, p-value = 8.492e-05alternative hypothesis: true mean is not equal to 895 percent confidence interval: 8.2 8.5sample estimates:mean of x 8.33.5.2 雙樣本雙樣本 t 檢驗(yàn)檢驗(yàn)weight-scan()16.68 20.67 18.42 1817.44 15.9518.68 23.22 21.42 1918.9

35、2 NAVariety-rep(c(LY1,DXY), rep(6,2)df-data.frame(Variety, weight)a-subset(df$weight, Variety = LY1)b var.test(a,b)F test to compare two variancesdata: a and bF = 0.67, num df = 5, denom df = 4, p-value = 0.6653alternative hypothesis: true ratio of variances is not equal to 195 percent confidence in

36、terval: 0.072 4.971sample estimates:ratio of variances 0.67 t.testt.test( (a,ba,b, paired = F), paired = F)Welch Two Sample t-testWelch Two Sample t-testdata: a and bdata: a and bt = -2.1, t = -2.1, dfdf = 7.8, p-value = 0.06591 = 7.8, p-value = 0.06591alternative hypothesis: true difference in mean

37、s is alternative hypothesis: true difference in means is not equal to 0not equal to 095 percent confidence interval:95 percent confidence interval: -5.0 0.2 -5.0 0.2sample estimates:sample estimates:mean of x mean of y mean of x mean of y 18 20 18 20 3.5.3 成對(duì)雙樣本成對(duì)雙樣本 t 檢驗(yàn)檢驗(yàn)height - scan() 7.127.264.

38、787.694.254.966.284.82 6.526.074.287.304.175.665.734.52 Variety - rep( c(A, B), rep(8, 2)df - data.frame(Variety, height) a - subset( df$height, Variety = A ) # 選取品種選取品種A的樹(shù)高的樹(shù)高b t.testt.test( (a,ba,b, paired=T), paired=T) Paired t-test Paired t-testdata: a and bdata: a and bt = 1.9, t = 1.9, dfdf =

39、7, p-value = 0.09624 = 7, p-value = 0.09624alternative hypothesis: true difference in means is alternative hypothesis: true difference in means is not equal to 0not equal to 095 percent confidence interval:95 percent confidence interval: -0.084 0.812 -0.084 0.812sample estimates:sample estimates:mea

40、n of the differences mean of the differences 0.36 0.36 32 種類放牧不放牧Asclepias syriaca0.0340.247Aster laevis 0.2440.096Aster lateriflorus 0.0410.146Aster novae-angliae 0.3100.365Aster simplex0.0620.088Dactylis glomerata0.0010.055Fragaria virginiana0.4410.385Hieracium pratense0.5920.626Phleum pratense0.3

41、870.911Picris hieracoides1.3691.510Plantago lanceolata 0.2600.208Poa compressa0.6100.773Poa pratensis0.0540.116Solidago altissima0.8431.967Solidago graminifolia0.2010.097Solidago juncea0.2780.148Solidago rugosa0.1560.197Taraxacum officinale0.1000.151N=1818個(gè)草地種在放牧和不放牧樣方中的生物量（個(gè)草地種在放牧和不放牧樣方中的生物量（kg/m2）

42、放牧對(duì)所研究草地物種生物量的影響是否顯著？放牧對(duì)所研究草地物種生物量的影響是否顯著？t=read.csv(t_test.csv,header=T)head(t)t.test(t$Graze,t$Control,paired=T) t.testt.test( (t$Graze,t$Control,pairedt$Graze,t$Control,paired=T)=T) Paired t-test Paired t-testdata: data: t$Grazet$Graze and and t$Controlt$Controlt = -1.7, t = -1.7, dfdf = 17, p-va

43、lue = 0.1097 = 17, p-value = 0.1097alternative hypothesis: true difference in means is alternative hypothesis: true difference in means is not equal to 0not equal to 095 percent confidence interval:95 percent confidence interval: -0.263 0.029 -0.263 0.029sample estimates:sample estimates:mean of the

44、 differences mean of the differences -0.12 -0.12 3.5 卡方檢驗(yàn)卡方檢驗(yàn)( 2 test)卡方檢驗(yàn)是參照卡方分配來(lái)計(jì)算概率和臨界值的統(tǒng)計(jì)檢驗(yàn)，是用途很廣卡方檢驗(yàn)是參照卡方分配來(lái)計(jì)算概率和臨界值的統(tǒng)計(jì)檢驗(yàn)，是用途很廣的一種假設(shè)檢驗(yàn)方法。的一種假設(shè)檢驗(yàn)方法。分析原理：分析原理： (1) 建立零假說(shuō)（建立零假說(shuō)（Null Hypothesis），即認(rèn)為觀測(cè)值與理論值的差異是），即認(rèn)為觀測(cè)值與理論值的差異是由于隨機(jī)誤差所致；由于隨機(jī)誤差所致； (2) 確定數(shù)據(jù)間的實(shí)際差異，即求出確定數(shù)據(jù)間的實(shí)際差異，即求出 2 值；值； (3) 如卡方值大于某特定概

45、率標(biāo)準(zhǔn)（即顯著性差異）下的理論值，則拒如卡方值大于某特定概率標(biāo)準(zhǔn)（即顯著性差異）下的理論值，則拒絕零假說(shuō)，即實(shí)測(cè)值與理論值的差異在該顯著性水平下是顯著的。絕零假說(shuō)，即實(shí)測(cè)值與理論值的差異在該顯著性水平下是顯著的?？ǚ綑z驗(yàn)35freq = c(22,21,22,27,22,36)probs = c(1,1,1,1,1,1)/6 chisq.test(freq,p=probs) chisq.test(freq,p=probs) Chi-squared test for given probabilitiesdata: freqX-squared = 6.7, df = 5, p-value = 0

46、.2423卡方檢驗(yàn)36x = c(100,110,80,55,14)probs = c(29, 21, 17, 17, 16)/100 chisq.test(x,p=probs) chisq.test(x,p=probs) Chi-squared test for given probabilitiesdata: xX-squared = 55, df = 4, p-value = 2.685e-11卡方檢驗(yàn)（列聯(lián)表）37yesbelt = c(12813,647,359,42)nobelt = c(65963,4000,2642,303)chisq.test(data.frame(yesbe

47、lt,nobelt) chisq.test(data.frame(yesbelt,nobelt) Pearsons Chi-squared testdata: data.frame(yesbelt, nobelt)X-squared = 59, df = 3, p-value = 8.61e-13練習(xí)四練習(xí)四以數(shù)據(jù)以數(shù)據(jù)stu.data.csv為例，為例，試對(duì)體重做頻數(shù)分析。試對(duì)體重做頻數(shù)分析。請(qǐng)分析身高是否符合正態(tài)分布？請(qǐng)分析身高是否符合正態(tài)分布？試分析性別對(duì)體重有無(wú)影響。試分析性別對(duì)體重有無(wú)影響。問(wèn)題問(wèn)題4：請(qǐng)檢驗(yàn)總體平均體重與：請(qǐng)檢驗(yàn)總體平均體重與60kg有無(wú)顯著差有無(wú)顯著差異？男生和女

48、生的平均體重有無(wú)顯著差異？異？男生和女生的平均體重有無(wú)顯著差異？問(wèn)題問(wèn)題5：男女生比例是否符合：男女生比例是否符合 1.2 : 1.0？練習(xí)四練習(xí)四答案答案df-read.csv(file=stu.data.csv,header=T)#問(wèn)題問(wèn)題1A - table(cut(df$weight, breaks = 40 + 15 * (0:7) round(prop.table(A) * 100,2) # 計(jì)算頻數(shù)比例計(jì)算頻數(shù)比例hist(df$weight, breaks = 7, xlim = c(40,140), xlab = weight, main = Frequency chart

49、 of weight)#問(wèn)題問(wèn)題2shapiro.test(df$height)#問(wèn)題問(wèn)題3fit-aov(weight Sex,data=df)summary(fit)library(agricolae)duncan.test(fit, Sex, alpha=0.05)$groups#問(wèn)題問(wèn)題4t.test(df$weight, mu = 60, alternative = two.sided)wt.m-subset(df$weight,df$Sex2=1)wt.f-subset(df$weight,df$Sex2=2)var.test(wt.m,wt.f) #等方差檢驗(yàn)等方差檢驗(yàn)t.tes

50、t(wt.m,wt.f, paired=F) #問(wèn)題問(wèn)題5summary(df$Sex)ct-c(87, 33)pt - c(1.2/2.2, 1.0/2.2) chisq.test(ct, p = pt)3.6 線性回歸線性回歸比如產(chǎn)量與施肥量有關(guān)，病蟲(chóng)害發(fā)生時(shí)期與氣溫有比如產(chǎn)量與施肥量有關(guān)，病蟲(chóng)害發(fā)生時(shí)期與氣溫有關(guān)，小麥單位面積產(chǎn)量與單位面積穗數(shù)、千粒重有關(guān)，關(guān)，小麥單位面積產(chǎn)量與單位面積穗數(shù)、千粒重有關(guān)，等等。因此，還需要研究?jī)蓚€(gè)或多個(gè)變量之間的關(guān)系。等等。因此，還需要研究?jī)蓚€(gè)或多個(gè)變量之間的關(guān)系。一個(gè)變量的變化受另一個(gè)或幾個(gè)變量的影響，稱為一個(gè)變量的變化受另一個(gè)或幾個(gè)變量的影響，稱

51、為因果關(guān)系。因果關(guān)系。利用回歸分析利用回歸分析(regression analysis)來(lái)研究呈因果關(guān)系來(lái)研究呈因果關(guān)系的變量之間的關(guān)系。表示原因的變量為自變量，表示結(jié)的變量之間的關(guān)系。表示原因的變量為自變量，表示結(jié)果的變量為因變量?；貧w分析有一元和多元回歸分析。果的變量為因變量?；貧w分析有一元和多元回歸分析。3.6.1 簡(jiǎn)單線性回歸簡(jiǎn)單線性回歸dfdf - read.csv( file = d4.7.1.csv, header = T) # - read.csv( file = d4.7.1.csv, header = T) # 讀入數(shù)據(jù)讀入數(shù)據(jù) fit - lm( weight N, da

52、ta = fit summary(fit)Coefficients: Estimate Std. Error t value Pr(|t|) Estimate Std. Error t value Pr(|t|) (Intercept) -87.5167 5.9369 -14.7 1.7e-09 (Intercept) -87.5167 5.9369 -14.7 1.7e-09 * * * *N 3.4500 0.0911 37.9 1.1e-14 N 3.4500 0.0911 37.9 1.1e-14 * * * *-Signif. codes: 0 * 0.001 * 0.01 * 0.

53、05 . 0.1 1Residual standard error: 1.5 on 13 degrees of freedomMultiple R-squared: 0.991, Adjusted R-squared: 0.99 F-statistic: 1.43e+03 on 1 and 13 DF, p-value: 1.09e-14 df$weight 1 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164 fitted(fit) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5

54、 6 7 8 9 10 11 12 13 14 15 113 116 119 123 126 130 133 137 140 144 147 151 154 157 161 113 116 119 123 126 130 133 137 140 144 147 151 154 157 161 residuals(fit) 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 2.417 0.967 0.517 0.067 -0.383 -0.833 -1.283 -1.733 -1.183 2.417 0.967 0.517 0.067 -0.383 -0.833 -1.28

55、3 -1.733 -1.183 10 11 12 13 14 15 10 11 12 13 14 15 -1.633 -1.083 -0.533 0.017 1.567 3.117 -1.633 -1.083 -0.533 0.017 1.567 3.117 5860626466687072120140160df$Ndf$weight計(jì)算校正R243這里這里n是對(duì)象的數(shù)量（樣點(diǎn)數(shù)量），是對(duì)象的數(shù)量（樣點(diǎn)數(shù)量），m是解釋變量的是解釋變量的數(shù)量（或更準(zhǔn)確說(shuō)，是模型的自由度）。只有模型數(shù)量（或更準(zhǔn)確說(shuō)，是模型的自由度）。只有模型自由度（自由度（m）不要比觀測(cè)值的數(shù)量（）不要比觀測(cè)值的數(shù)量（n）大（保

56、證）大（保證n-m-10）,公式是有效的。公式是有效的。3.6.2 多項(xiàng)式回歸多項(xiàng)式回歸fit summary(fit2)Coefficients: Estimate Std. Error t value Pr(|t|) Estimate Std. Error t value Pr(|t|) (Intercept) 261.87818 25.19677 10.39 2.4e-07 (Intercept) 261.87818 25.19677 10.39 2.4e-07 * * * *N -7.34832 0.77769 -9.45 6.6e-07 N -7.34832 0.77769 -9.4

57、5 6.6e-07 * * * *I(N2) 0.08306 0.00598 13.89 9.3e-09 I(N2) 0.08306 0.00598 13.89 9.3e-09 * * * *-Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1Residual standard error: 0.38 on 12 degrees of freedomMultiple R-squared: 0.999,Adjusted R-squared: 0.999 F-statistic: 1.14e+04 on 2 and 12 DF, p-value: 2e-1

58、65860626466687072120140160df$Ndf$weight3.6.3 多元線性回歸多元線性回歸dfdf- read.csv(file=d4.7.3.csv, header=T)- read.csv(file=d4.7.3.csv, header=T)lmfitlmfit-lm(yx1+x2+x3+x4,data=-lm(yx1+x2+x3+x4,data=dfdf) )step(lm(yx1+x2+x3+x4,data=step(lm(yx1+x2+x3+x4,data=dfdf)lmfit2lmfit2-lm(yx1+x2+x3,data= summary(lmfit)C

59、oefficients: Estimate Std. Error t value Pr(|t|) Estimate Std. Error t value Pr(|t|) (Intercept) -625.358 114.378 -5.47 6.5e-05 (Intercept) -625.358 114.378 -5.47 6.5e-05 * * * *x1 15.196 2.127 7.15 3.4e-06 x1 15.196 2.127 7.15 3.4e-06 * * * *x2 7.378 1.889 3.91 0.0014 x2 7.378 1.889 3.91 0.0014 * *

60、 * x3 9.503 1.342 7.08 3.7e-06 x3 9.503 1.342 7.08 3.7e-06 * * * *x4 -0.847 1.493 -0.57 0.5790 x4 -0.847 1.493 -0.57 0.5790 -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1Residual standard error: 36 on 15 degrees of freedomMultiple R-squared: 0.894,Adjusted R-squared: 0.866 F-statistic: 31.8 on 4 an

人人文庫(kù)> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

基本統(tǒng)計(jì)-(3)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

基本統(tǒng)計(jì)-(3)

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔