




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Mathematical Statistics and Data AnalysisJohn A. RiceUniversity of California, Berkeley Arrangement of the Course Chapter 1 Summarizing DataChapter 2 Comparing Two SamplesChapter 3 The Analysis of VarianceChapter 4 Linear Least SquaresWhat we should learn? Mathematics(Statistics) English Computer(SA
2、S) (Statistical Analysis System) Chapter 1 Summarizing DataMethods Based on the Cumulative Distribution FunctionHistograms, Density Curves and Stem-and-Leaf PlotsMeasures of LocationMeasures of DispersionThe Empirical Cumulative Distribution Function(ecdf)Suppose that is a batch of numbers. The ecdf
3、 is defined as Denote the ordered batch of numbers by , then the ecdf can be expressed asProperties of the Empirical Cumulative Distribution FunctionTheorem 1 Theorem 2 That is , tends to simultaneously with probability one. ExamplesPlot the ecdf of this batch of numbers: 1,14,10,9,11,9SAS data set:
4、 beeswax.sas Solutions Analysis Interactive Data Analysis (Find beeswax.sas from Work) Analyze Distribution Output Cumulative Distribution function Empirical The Survival Function If denotes time until failure or death with cdf , the survival function is defined as which is simply the probability th
5、at the life time will be longer than .The empirical survival function is given by where is the ecdf of random variable . The Hazard FunctionThe hazard function is defined as which is the instantaneous rate of mortality of an individual alive at .The log of the empirical survival function is defined
6、asExample Calculate the hazard function for the exponential distribution:Let denote the density function and the hazard function of a nonnegative random variable. Show that Quantile-Quantile PlotsThe th quantile of the distribution is the value of such that or Pth quantile The empirical quantile of
7、dataFor the given sample , the ecdf for or ;Let , thus the th quantile of data is assigned to ;Assessing Goodness of Fit by Using Q-Q PlotIs the sample from the distribution ?The empirical th quantile is ; The theoretical th quantile of is , which satisfies ;The dots on the plane would be approximat
8、ely a straight line if the sample comes from .Comparing Two Samples by using Q-Q plotAre sample and from the same distribution? The empirical th quantile of is ; The empirical th quantile of is ;The dots on the plane would be approximately a straight line if the sample comes from the same distributi
9、on.Example SAS data set: beeswax.sas Solutions Analysis Interactive Data Analysis (Find beeswax.sas from Work) Analyze Distribution Output Normal Q-Q plotHistograms Example: beeswax.sasDensity CurvesKernel Probability Density EstimationLet be the standard normal density, then the rescaled version of
10、 , is defined as which is the normal density with standard deviation ;Let be a sample from a probability , then is the normal density with mean and standard deviation ;The kernel probability density estimate of is then given by where is a chosen bandwidth.Example Beeswax Solutions Analysis Interacti
11、ve Data Analysis (Find beeswax.sas from Work) Analyze Distribution OutputDensity Estimate Normal (kernel density)Stem-and-Leaf PlotsExample beeswax.sas Measures of LocationThe Arithmetic Mean For a batch of numbers , the most commonly used measure of location is Disadvantage: The measure is sensitiv
12、e to outliers in the data set.The Median For the order observations the median is defined asAdvantage: Median is robust and insensitive to outliers.The Trimmed Mean trimmed mean is the average of the remaining data left by discarding the lowest and the highest of the order data. It can be expressed
13、as In general, will be chosen from 0.1 to 0.2.M-estimatesThe sample mean minimizes The median is the minimum of M-estimates is defined to minimize where Comparison of Location EstimationThere is no single estimate that is best for all symmetric distribution although they all estimate the center of s
14、ymmetry;In general, 10% trimmed mean or 20% trimmed mean is overall quite an effective estimate since its variance is quite small.Measures of Dispersion The sample standard deviation, ,which is the square root of the sample variance,Disadvantage: The sample standard deviation is sensitive to outlyin
15、g observation.Interquartile Range (IQR)The sample quartiles are defined as is called the lower quartile and is called the upper quartile;Interquartile range is the difference between two sample quartiles, that is which is also measure the dispersion and robust to outliers.Calculation of IQRFind the
16、IQR of 45, 60,21, 19, 4, 31The order observations: 4,19,21,31,45,60In our example, , so Thus, Median Absolute Deviation from the median(MAD)If the data is with median , the MAD is defined to be the median of the numbers , which also measure the dispersion and robust to outliers.Find the MAD of 45, 6
17、0,21, 19, 4, 31 The ordered values are 4,19,21,31,45,60The median is then The s are 22, 7, 5, 5,19, 34The ordered values are 5, 5, 7, 19, 22, 34The MAD of is then the median of the data above, i.e. Beeswax Solutions Analysis Interactive Data Analysis (Find beeswax.sas from Work) Analyze Distribution
18、 Output Moments QuantilesEstimation of sigma for a normal distributionIf is the standard deviation of the sample from a normal distribution, then The second possible estimation of sigma isThe third is BoxplotsThe most extreme value within a distance of 1.5IQR of the upper quartileThe most extreme va
19、lue within a distance of 1.5IQR of the lower quartileWhats in the BoxplotsA measure of location(median) and dispersion (the interquartile range)The presence of possible outliersAn indication of the symmetry or skewness of the distributionHomework 1With the first three lines of data from problem 6 on
20、 page 375(1) Make the empirical cumulative distribution function, the stem-and-leaf plot, and the boxplots (2) Find the measures of location such as the sample mean, median,10% trimmed mean and the measure of dispersion such as IQR and MAD(3) Do you think the data come from a normal with parameters
21、as the sample mean and sample variance?Distribution Derived from the Normal Distribution- DistributionIf are independent standard normal variables, the distribution of is called the chi-square distribution with n degree of freedom and is denoted by ; If , then and ;If , and are independent, then . D
22、istributionIf and , and and are independent, then the distribution of is called distribution with degree of freedom n;The density of the distribution is symmetric about zero;The distribution tends to the standard normal distribution as n tends to infinity(n29). Distribution Let and be independent ch
23、i-square random variables with m and n degrees of freedom, respectively. The distribution of is called the distribution with m and n degree of freedoms and is denoted by ;If , then The Distribution of the Sample mean and Sample VarianceLet be the sample from , then and are independently distributed
24、. Principles of the Hypothesis Test Determine the null hypothesis and the alternative hypothesis based on a practical problem;Determine a test statistic with distribution known under ; Determine the rejection region of the test according to the rule of small probability event.A Example of Hypothesis
25、 TestLet be the sample from , : : Under , or The rejection region for the test is , since where is the upper quantile of the standard normal distribution Chapter 2 Comparing Two SamplesComparing Two Independent SamplesMethods Based on the Normal DistributionThe Mann-Whitney Test A Nonparametric Meth
26、odComparing Paired SamplesMethods Based on the Normal DistributionThe Signed Rank Test A Nonparametric MethodComparing Two Independent SamplesA treatment group is drawn fromA control group is drawn fromIf there is any effect of the treatment : against : If is rejected, there were a treatment effect.
27、Otherwise no treatment effect can be identified.The Case of Sigma is KnownThe StatisticUnder , The statistic is The Rejection Region is The Case of Sigma is UnknownThe StatisticUnder ,independent The Statistic is where is the pooled sample standard deviation.The rejection region for two-sided altern
28、ative is Exampleanalyst open by SAS name work ice.sas statistics Hypothesis tests Plots T distribution plot Two-sample t-test for mean Tests confidence intervalThe rejection region for : The rejection region for :One-Sided AlternativesConfidence Interval for If was rejected, what about the differenc
29、e of and ?If there are two statistics and such that then the interval is called a confidence interval of . is called the upper limit and is called the lower limit.The Case of Sigma is KnownThe estimate of is The distribution of is The confidence interval is given by The Case of Sigma is UnknownThe e
30、stimate of isSo, The confidence interval is given by A Nonparametric MethodThe Mann-Whitney TestA treatment group is drawn from A control group is drawn from : The treatment has no effect : : against : The Rank of an ObservationFor the ordered values , if the rank of is ;The sum of ranks is ;If ther
31、e are ties, tied observations are assigned average ranks. The Exact Mann-Whitney Test The Idea of Testing Group all observations together;Rank them in order of increasing size;Calculate the sums of the ranks of those observations that come from the control group;If this sum is too small or too large
32、, we reject The Exact Mann-Whitney TestA treatment group A control groupThe possible R for the control group Ranks 1,2 1,3 1,4 2,3 2,4 3,4 R 3 4 5 5 6 7The distribution of R r 3 4 5 6 7 P 1/6 1/6 1/3 1/6 1/6The Approximate Mann-Whitney Test The Idea of Testing Under , A natural estimate of is where
33、If is too large, we reject The Properties of Theorem A. Under the null hypothesisTheorem B. For m and n both greater than 10, the null distribution of can be approximated by a normal distribution by the central limit theorem, that is, The Test StatisticCorollary A. If the rank of in the combined sam
34、ple is denoted by , then Corollary B. Let denote the sum of the m ranks of the Ys, under : F=G Thus , the test statistic Comparing Paired Samples Methods Based on the NormalFor the same object, the measure “before” the treatment is denoted by X, the measure “after” the treatment by Y and ;For the n
35、objects, we have paired sample :The treatment has no effect : The Test StatisticConsider the new series of independent observations where and : against : The statistic is The rejection region is The confidence interval for An ExampleAnalyst Open by SAS name work platelet.sas statistics Hypothesis te
36、sts Plots T distribution plot Two-sample paired t-test Tests confidence intervalA Nonparametric MethodThe Signed Rank TestIdea of The TestUnder , is equally likely to be positive or negative and the distribution of is symmetric about zero; Let , and . If is the sum of those ranks that has positive s
37、igns, it can be expressed as where If is too large or too small ,we reject .The Approximate Distribution of Theorem A. Under the null hypothesis that are independent and symmetrically distributed about zero, we have Corollary A. If the sample size is greater than 20, then approximatelyExample Page 4
38、15 An Example Measuring Mercury Levels in FishHomework 2 (First Part)Page 427 Problem 19. a. b. c. d. Page 430 Problem 34Chapter 3 The Analysis of VarianceThe One-Way LayoutMethod Based on the Normal TheoryF testA Nonparametric Method The Kruskal-Wallis TestThe Two-Way LayoutRandomized Block Designs
39、A Nonparametric Method Friedmans TestAn Example for The Two-Way Layout: P443The One-Way LayoutMethods Based on the Normal TheoryThe first model for comparing treatments here The Null Hypothesis No treatment effectThe second model for comparing treatments are independently distributed as The Null Hyp
40、othesis There is no treatment effectData TableSum of SquaresDecomposition 1Decomposition 2 or The total sum of squaresThe sum of squares between groups a measure of the variation of the means among the treatments The sum of squares within groups a measure of the variation of the data within the trea
41、tment groupsThe Test StatisticTheorem A. Theorem B. and is independent of Under , The test statistic is then given by The Analysis of Variance Table source SS MS F prF Model Error TotalThe rejection region is given by An ExamplePage 449 Example A. The analysis of variance by SAS:Analyst Open by SAS
42、name Work Tablets.sas Statistics ANOVA One-Way ANOVA The analysis of variance by Matlab: anova1(x) The Case of Treatments with Different Sample SizeThe model are independently distributed as The Null Hypothesis: The Sum of SquaresThe total sum of squares: The sum of squares between groups:The sum of
43、 squares within groups:The relation between the sum of squares: The Statistic is distributed as Chi-square with degree of freedom of ;Under , and is independent of The F-statistic is then given by with degree of freedom of and A Nonparametric MethodKruskal-Wallis TestThe model for comparing treatmen
44、tsThe Null Hypothesis Test: The Idea of the Test : the rank of in the combined sample : the average rank of th group : the overall average of ranks : a measure of dispersion of Under , is almost the same as that of ;If is too large, we have to reject the null hypothesis.The Test Statistic With and o
45、r and approximatelyThe rejection region is An ExamplePage 449 Example A. The analysis of variance by SAS: Analyst Open by SAS name Work Tablets.sas Statistics ANOVA Nonparametric One-Way ANOVA An Example for The Two-Way Layout: p396,p461The Two-Way LayoutMethods based on the normal Theory or The ove
46、rall mean levelThe effect of factor AThe effect of factor BThe interaction between A and BThe random errorThe Model for Comparing TreatmentsThe model are independently distributed as The effect of factor AThe effect of factor BThe interaction between A and BThe Null HypothesesNo effect of Factor ANo
47、 effect of factor BNo effect of InteractionSum of SquaresThe simple decomposition The decomposition of the sum of squaresThe total sum of squaresThe sum of squares due to AThe sum of squares due to BThe sum of squares due to interactionThe sum of squares for errorThe Test StatisticsTheorem A. Under
48、the assumption that the error are independently distributed with mean 0 and variance , we have Theorem B. With the same conditions of theorem A, we have Under , Under , Under , are all independently distributedTheorem C. The F-statistic for is with and degree of freedom; The F-statistic for is with
49、and degree of freedom; The F- statistic for is with and degree of freedom. Analysis of Variance TableAn ExamplePage 461 or page 396The analysis of variance by SAS Analyst Open by SAS name Work iron.sas Statistics ANOVA Factorial ANOVA ModelThe analysis of variance by Matlab anova2(x,rep)An Example f
50、or Randomized Block Designs: Randomized Block DesignsThe Model for Comparing the Effect of Different FertilizersThe simple compositionThe model are independently distributed as The differential effect of ith treatmentThe random errorThe differential effect of jth blockThe null hypothesesNo effect of
51、 fertilizersNo effect of blocksThe sum of squaresThe Test StatisticTheorem A. Under the assumption that the error are independently distributed with mean 0 and variance , we haveTheorem B. With the same conditions of theorem A, we have Under , Under , Theorem C.The F-statistic for is The F-statistic
52、 for is Analysis of Variance TableAn ExamplePage 467 Example AThe analysis of variance by SAS Analyst Open by SAS name Work itching.sas Statistics ANOVA Factorial ANOVA ModelThe analysis of variance by Matlab anova2(x,1)A Nonparametric MethodFriedmans TestThe model for comparing fertilizersThe Null
53、Hypothesis Test: The Idea of the Test : the rank of in the combined sample of th block; : the average rank of th treatment : the overall mean rank : a measure of dispersion of If there is no difference among I fertilizers, are almost the same. So if is too large, then we reject .The Test Statistic a
54、pproximatelyHomework 2(part two)Page 475 Problem 20 (The first question)Page 479 Problem 24 (Using a parametric analysis)Page 481 Problem 27 a.Chapter 4 Linear Least SquaresSimple Linear SquaresAssessing the FitThe Matrix Approach to Simple Linear SquaresMultivariate Linear RegressionSimple Linear S
55、quaresThe standard statistical model here are independent random variable with mean zero and variance .The problemsHow to determine the intercept and the slopeWhat is the unbiased estimate of Methods of assessing goodness of fitThe Method of Least Squares in Straight-FittingIf and minimize the sum o
56、f squares for error then they are called the least square estimates of the intercept and the slope.It can be shown that Statistical Properties of the Estimated Slope and InterceptTheorem A. Under the assumption of the standard statistical model, the least squares estimates are unbiased: and Theorem
57、B. If are independently distributed as , we have The fitted line can be expressed as: The Residual Sum of Squares(RSS)An Estimate of The residual sum of squaresTheorem C. a. Under the assumption of the standard statistical model, , i.e. an unbiased estimate of is . b. If are independently normal dis
58、tributed, and is independent of and .Theorem D.a. If are independently normal distributed, here and b. The confidence intervals for are An Example Interactive Data Analysis SASuser Class.sas Analyze Fit OutputAssessing the FitThe coefficient of determinationThe sum-of-square decompositionThe coefficient of determination is defined asThe adjust R-square is defined asTotal sum of squaresRegression sum of squaresResidual sum of squaresHypothesis Test for Theorem A. If are independently normal distributed, then Under , and is
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- GB/T 45227-2025化工園區(qū)封閉管理系統(tǒng)技術(shù)要求
- GB/T 45126-2025鋼渣碳酸化固定二氧化碳含量的測(cè)定方法
- 出攤貨架轉(zhuǎn)讓合同范本
- 農(nóng)村田地征用合同范本
- 臨時(shí)股合同范本
- 代課老師合同范本
- 冰箱采購(gòu)談判合同范本
- 半永久加盟合同范本
- 健身器合同范本
- 養(yǎng)殖鴿子合作合同范本
- 2024年衛(wèi)生資格(中初級(jí))-內(nèi)科學(xué)主治醫(yī)師筆試考試歷年真題含答案
- 消防設(shè)施維保服務(wù)投標(biāo)方案
- 城市軌道交通車輛電氣控制 課件 趙麗 第1-4章 城市軌道交通車輛電氣控制系統(tǒng)構(gòu)成、城市軌道交通車輛輔助供電系統(tǒng)、電動(dòng)列車常用電氣控制系統(tǒng)及其控制方法
- (2024年)新版黃金基礎(chǔ)知識(shí)培訓(xùn)課件
- 資產(chǎn)拆除報(bào)廢申請(qǐng)表
- 《社區(qū)康復(fù)》課件-第九章 言語障礙患者的社區(qū)康復(fù)實(shí)踐
- 萬千教育學(xué)前讓幼兒都愛學(xué)習(xí):幼兒園高質(zhì)量學(xué)習(xí)活動(dòng)設(shè)計(jì)與組織
- 保胎患者護(hù)理
- 綠之源家電清洗調(diào)查問卷
- 孕前優(yōu)生檢查培訓(xùn)課件
- 《醫(yī)藥板塊分析》課件
評(píng)論
0/150
提交評(píng)論