概率論與統(tǒng)計(jì)學(xué)原理chapt2english_第1頁
概率論與統(tǒng)計(jì)學(xué)原理chapt2english_第2頁
概率論與統(tǒng)計(jì)學(xué)原理chapt2english_第3頁
概率論與統(tǒng)計(jì)學(xué)原理chapt2english_第4頁
概率論與統(tǒng)計(jì)學(xué)原理chapt2english_第5頁
已閱讀5頁,還剩58頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、ch2 methods for describing sets of data1. describe data graphically通過圖表o 2.describe data numerically用數(shù)表示地learning objectsexample aphasiasubject type of aphasia12345678910111213141516171819202122brocasanomicanomicconductionbrocasconductionconductionanomicconductionanomicconductionbrocasanomicbrocasan

2、omicanomicanomicconductionbrocasanomicconductionanomicthe researchers want to determine whether one type of aphasia occurs more often than any other, and, if so, how often.describing qualitative data o qualitative data are nonnumerical in nature, thus the value of a qualitative variable can only be

3、classified into categories called classes. we can summarise such data numerically in two ways: (1) by counting the class組 frequency頻數(shù) the number of observations in the data set that fall into each class, or (2) by calculating the class relative frequency相對頻數(shù) the proportion of the total number of obs

4、ervations falling into each class.describing qualitative datadef 2.1 a class is one of the categories分類 into which qualitative data can be classified.def 2.2 the class frequency is the number of observations in the data set falling into a particular class.def 2.3 the class relative frequency is the

5、class frequency divided by the total number (denote as n, the size of the data set) of observations in the data set, i.e.,class relative frequency = class frequency/n.example aphasiaclasstype of aphasiafrequencynumber of subjectsrelative frequencyproportionbrocasconductionanomic5710.227.318.455total

6、s221.000bar graph and pie chartothe most widely used graphical methods for summarizing qualitative data are bar graphs and pie chart. 1. bar graph shows the amount of data that belongs to each class as proportionally sized rectangular areas2. pie chart shows the amount of data that belongs to each c

7、lass as a proportional part of a circle bar graphbar graphpie chart餅圖 1. shows breakdown of total quantity into categories2. useful for showing relative differences3. angle sizen(360)(percent) graphical methods for describing quantitative data o quantitative data sets consist of data that are record

8、ed on a meaningful numerical scale. for describing, summarizing, such data sets, we introduce here three graphical methods: dot plots, stem-and-leaf displays, and histograms. example epagas36.332.740.536.238.536.341.037.037.139.941.037.336.537.939.036.831.837.240.336.936.941.237.636.035.532.537.340.

9、736.732.937.136.633.937.934.836.433.137.437.033.844.932.940.235.938.640.537.037.133.939.836.836.536.438.239.436.637.637.840.134.030.033.237.738.335.336.137.035.938.036.837.237.437.735.734.438.238.735.635.235.042.137.540.035.638.838.439.036.734.838.136.733.639.335.834.539.536.9epa mileage

10、 ratings on 100 carsdot plots 點(diǎn)圖o dot plot condenses the data by groping all values that are the same together in the plot. in the dot plot, the horizontal axis is a scale for the quantitative variable and the numerical value of each measurement in the data set is located on the horizontal scale by

11、a dot. when data values repeat, the dots are placed above one another. see the figure in the example below.dot plotstem-and-leaf display 莖葉圖o stem-and-leaf display combines graphic technique and sorting technique. it is very popular for summarizing numerical data.1. divide each observation into stem

12、 value and leaf valuenthe leading digit(s) becomes the stemnthe trailing digit(s) becomes the leaf stem-and-leaf displaystem-and-leaf displayexample construct a stem-and-leaf display of the following set of 20 test scores. 82 74 88 66 58 74 78 84 96 76 62 68 72 92 86 76 52 76 82 78stem-and-leaf disp

13、layfigure 1: 20 exam scores5 2 86 2 6 87 2 4 4 6 6 6 8 88 2 2 4 6 89 2 6stem-and-leaf displaystem-and-leaf of mpg n=100leaf unit=0.1030 031 832 5 7 9 933 1 2 6 8 9 9 34 0 2 4 5 8 835 0 1 2 3 5 6 6 7 8 9 9 36 0 1 2 3 3 4 4 5 5 6 6 7 7 8 9 937 0 0 0 0 1 1 1 2 2 3 3 4 4 5 6 6 7 7 8 9 938 0 1 2 2 3 4 5

14、6 7 839 0 0 3 4 5 7 8 940 0 1 2 3 5 5 741 0 0 242 14344 9histograms1. divide the data set into class intervals of equal size2. count the class frequency or calculate the class relative frequency3. conduct the histogram1. determine range2. compute class intervals (width)3. select number of classesn u

15、sually between 5 & 15 inclusive4. determine class boundaries (limits)5. count observations & assign to classesclassify分類 the quantitative datahistogrammpgfrequencyhistogramo the effect of the size of a data set on the outline of a histogramexercise1.calculate the number of the 500 measurements falli

16、ng into each of the measurement classes. then graph a frequency histogram for these data.measurement classrelative frequency0.5-2.52.5-4.54.5-6.56.5-8.58.5-10.510.5-12.512.5-14.514.5-5summation notation總和符號 summation is something that is done quite often in mathema

17、tics, and there is a symbol that means summation. that symbol is the capital greek letter sigma, and so the notation is sometimes called sigma notation instead of summation notation. numerical數(shù)值 methods for describing quantitative datao two most important data characteristics: 1. central tendency: t

18、he tendency of the data to cluster, or centre.2. variability: the dispersion or spread of the data. central tendency集中趨勢1. mean2. median3. modedef 2.4 the mean (arithmetic mean) of a set of data is the sum of the measurements divided by the number of measurements contained in the data set.the mean (

19、average) of a set of data (a sample), x1, x2, ., xn, is defined byniixnxofnumberxallofsumx11mean 平均數(shù)mean exampleraw data:xmean example (epagas)o the mean gas mileage for the 100 cars is994.361009 .441 .42.5 .328 .310 .301001001iixxmedian 中位數(shù)o def 2.5 the median of a quantitative

20、data set is the middle value of the data ranked in ascending (or descending) order. 1. position of median in sequence2. medianraw data: 24.1 22.6 21.5 23.7 22.6ordered:21.5 22.6 22.6 23.7 24.1position:12345median example odd-sized sampleraw data: 10.3 4.9 8.9 11.7 6.3 7.7ordered:4.9 6.3 7.7 8.9 10.3

21、11.7position:123456median example even-sized samplemedian example (epagas)o the median is 37.0.o the sample size n = 100 is an even number and there are two middle values located at 50th and 51st positions after ordering. these two middle values are 37.0 and 37.0 and so, the median is 37.0.o this va

22、lue implies that about half of the 100 mileages in the data set fall below 37.0 and half lie above 37.0.comparing the mean and mediano a data set said to be skewed斜的斜的 if one tail of the distribution has more extreme observations than other tail.1. if the data set is skewed to the right, then the me

23、dian is less than the mean.2. if the data set is symmetric對稱, then the median is equal to the mean. 3. if the data set is skewed to the left, then the median is larger than the mean.o def 2.6 the mode is the measurement that occurs most frequently in the data set.1. may be no mode or several modes2.

24、 may be used for quantitative & qualitative datamode 重?cái)?shù)raw data: 10.3 4.9 8.9 11.7 6.3 7.7raw data: 6.3 4.9 8.9 6.3 4.9 4.9raw data: 212828414343mode examplemode examplee.g. consider the grades of 18 students,b, c, b, a, f, d, b, c, bc, b, a, f, c, b, d, c, a1. grade: a b c d f2. frequency: 3 6 5 2

25、23. the mode is the grade b.mode example (epagas)o mode of the mileage rating is 37.0 (occurs most often)exercise1. calculate the mean, median, and mode for each of the following samples:a. 7, -2, 3, 3, 0, 4b. 2, 3, 5, 3, 2, 3, 4, 3, 5, 1, 2, 3, 4c. 51, 50, 47, 50, 48, 41, 59, 68, 45, 37variation va

26、riability離散程度1. range2. variance3. standard deviationrange 極差o def 2.8 the range of a quantitative data set is equal to the largest measurement minus the smallest measurement. it is the simplest measure of dispersion. range=max-mino it is very sensitive to extreme values.range exampleexample1. for t

27、he data set 11, 12, 13, 13, 13, 14, 15,2. for the data set 9, 10, 11, 13, 15, 16, 17 x= 13, and range = 15-11 = 4 x= 13, and range = 17-9 = 8 variance方差 & standard deviation標(biāo)準(zhǔn)差o the variance and standard deviation measure the spread of the data set around the mean. it is the average of the squares o

28、f the distance each measurement in the data set is from the mean of all the measurements in the data set.o note that the variance, 2, of a population is defined as niixn122)(1sample varianceo def 2.8 the sample variance, s2 for a sample of n measurements is equal to the sum of the squared distances

29、from the mean divided by (n-1).sample variance formula.)(11)(112112122nxxnxxnsniiniiniivariance examplee.g.(1) for the sample 11, 12, 13, 13, 13, 14, 15, =13 (2) for the sample 9, 10, 11, 13, 15, 16, 17 =13xx333. 13468)1315(.)1312()1311(1712222s667. 9329658)1317(.)1310()139(1712222ssample standard d

30、eviationo def 2.9 the sample standard deviation is s, the positive square root of the variance.)(11)(11121212nxxnxxnsniiniiniiinterpreting standard deviation o to understanding how the standard deviation provides a measure of variability of a data set, consider a specific data set and answer the fol

31、lowing questions. how many measurements are within 1 standard deviation of the mean? how many measurements are within 2 standard deviations of the mean?1. chebyshevs rule o let k be any positive number greater than 1. for any data set (regardless of the shape of the frequency distribution of the dat

32、a), the proportion of observations that lie within k standard deviations of the mean is at least 1-1/k2. 1,11)|(|2kforkksxxp1. chebyshevs ruleo this rule says that within two standard deviations of the mean (k = 2), ( -2s, +2s), you will always find at least 75% of the data (since 1-1/k2 = 1-1/4 = 0

33、.75) and that within three standard deviations of the mean (k = 3), ( -3s, +3s), you will always find at least 89% of the data (since 1-1/k2 = 1-1/9 = 0.89).o remarks: 1. chebyshevs rule applies to any data set, regardless of the shape of the frequency distribution of the data. 2. chebyshevs rule ca

34、n apply to sample and population as well.xxxx2. empirical rule 經(jīng)驗(yàn)法則othe empirical rule is a rule of thumb that applies to data sets with frequency distributions that are mound-shaped (bell-shaped) and symmetric. napproximately 68% of the measurements will fall within 1 standard deviation of the mean

35、, i.e., within the interval ( s, + s) for sample and ( , + ) for population.napproximately 95% of the measurements will fall within 2 standard deviations of the mean, i.e., within the interval ( 2s, + 2s) for sample and ( 2, + 2) for population.napproximately 99.7% of the measurements will fall with

36、in 3 standard deviations of the mean, i.e., within the interval ( 3s, + 3s) for sample and ( 3, + 3) for population.xxxxxxexampleexample 2.12 (pp.64-65). a manufacture of automobile batteries claims that the average length of life for its grade a battery is 60 months. however, the guarantee on this

37、brand is for just 36 months. suppose the standard deviation of the life length is known to be 10 months, and the frequency distribution of the life-length data is known to be mound-shaped.a. approximately what percentage of the manufacturers grade a batteries will last more than 50 months, assuming

38、the manufacturers claim is true?b. approximately what percentage of the manufacturers grade a batteries will last less than 40 months, assuming the manufacturers claim is true?c. suppose your battery lasts 37 months. what could you infer about the manufacturers claim?solutiona.the percentage of batt

39、eries lasting more than 50 months is approximately 84% of the batteries should have life length exceeding 50 months.b.approximately 2.5% of the batteries should fail prior to 40 months.c.if you are so unfortunate that your grade a battery fails at 37 months, you can make two inferences: either your

40、battery was one of the approximately 2.5% that fail prior to 40 months, or something about the manufacturers claim is not true. because the chances are so small that a battery fails before 40 months.numerical measures of relative standing 相對位置o measures of relative standing are numbers which indicat

41、e where a particular value lies in relation to the rest of values in a data set.1. percentiles2. z-scorepercentiles百分位 o def 2.11 suppose the measurements x1, x2, ., xn have been ranked in ascending order. the pth percentile is a number such that p% of the measurements fall below the pth percentile

42、and (100-p)% fall above it.o lower quartile (first quartile), q1 is the 25th percentile.o upper quartile (third quartile), q3 is the 75th percentile.o the median is the 50th percentile (or second quartile).procedure for calculating percentiles 1. arrange the n observations in ascending order, x(1),

43、x(2), ., x(n).2. calculate the position index k = (pn)/100, where p is the percentile of interest and n is the sample size.3a. if k is an integer, the pth percentile is (x(k)+x(k+1)/2.3b. if k is not an integer, the next integer value greater than k is the pth percentile.calculating percentilesexample. consider the following 15 ranked

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論