版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、ch2 methods for describing sets of data1. describe data graphically通過圖表o 2.describe data numerically用數(shù)表示地learning objectsexample aphasiasubject type of aphasia12345678910111213141516171819202122brocasanomicanomicconductionbrocasconductionconductionanomicconductionanomicconductionbrocasanomicbrocasan
2、omicanomicanomicconductionbrocasanomicconductionanomicthe researchers want to determine whether one type of aphasia occurs more often than any other, and, if so, how often.describing qualitative data o qualitative data are nonnumerical in nature, thus the value of a qualitative variable can only be
3、classified into categories called classes. we can summarise such data numerically in two ways: (1) by counting the class組 frequency頻數(shù) the number of observations in the data set that fall into each class, or (2) by calculating the class relative frequency相對頻數(shù) the proportion of the total number of obs
4、ervations falling into each class.describing qualitative datadef 2.1 a class is one of the categories分類 into which qualitative data can be classified.def 2.2 the class frequency is the number of observations in the data set falling into a particular class.def 2.3 the class relative frequency is the
5、class frequency divided by the total number (denote as n, the size of the data set) of observations in the data set, i.e.,class relative frequency = class frequency/n.example aphasiaclasstype of aphasiafrequencynumber of subjectsrelative frequencyproportionbrocasconductionanomic5710.227.318.455total
6、s221.000bar graph and pie chartothe most widely used graphical methods for summarizing qualitative data are bar graphs and pie chart. 1. bar graph shows the amount of data that belongs to each class as proportionally sized rectangular areas2. pie chart shows the amount of data that belongs to each c
7、lass as a proportional part of a circle bar graphbar graphpie chart餅圖 1. shows breakdown of total quantity into categories2. useful for showing relative differences3. angle sizen(360)(percent) graphical methods for describing quantitative data o quantitative data sets consist of data that are record
8、ed on a meaningful numerical scale. for describing, summarizing, such data sets, we introduce here three graphical methods: dot plots, stem-and-leaf displays, and histograms. example epagas36.332.740.536.238.536.341.037.037.139.941.037.336.537.939.036.831.837.240.336.936.941.237.636.035.532.537.340.
9、736.732.937.136.633.937.934.836.433.137.437.033.844.932.940.235.938.640.537.037.133.939.836.836.536.438.239.436.637.637.840.134.030.033.237.738.335.336.137.035.938.036.837.237.437.735.734.438.238.735.635.235.042.137.540.035.638.838.439.036.734.838.136.733.639.335.834.539.536.9epa mileage
10、 ratings on 100 carsdot plots 點(diǎn)圖o dot plot condenses the data by groping all values that are the same together in the plot. in the dot plot, the horizontal axis is a scale for the quantitative variable and the numerical value of each measurement in the data set is located on the horizontal scale by
11、a dot. when data values repeat, the dots are placed above one another. see the figure in the example below.dot plotstem-and-leaf display 莖葉圖o stem-and-leaf display combines graphic technique and sorting technique. it is very popular for summarizing numerical data.1. divide each observation into stem
12、 value and leaf valuenthe leading digit(s) becomes the stemnthe trailing digit(s) becomes the leaf stem-and-leaf displaystem-and-leaf displayexample construct a stem-and-leaf display of the following set of 20 test scores. 82 74 88 66 58 74 78 84 96 76 62 68 72 92 86 76 52 76 82 78stem-and-leaf disp
13、layfigure 1: 20 exam scores5 2 86 2 6 87 2 4 4 6 6 6 8 88 2 2 4 6 89 2 6stem-and-leaf displaystem-and-leaf of mpg n=100leaf unit=0.1030 031 832 5 7 9 933 1 2 6 8 9 9 34 0 2 4 5 8 835 0 1 2 3 5 6 6 7 8 9 9 36 0 1 2 3 3 4 4 5 5 6 6 7 7 8 9 937 0 0 0 0 1 1 1 2 2 3 3 4 4 5 6 6 7 7 8 9 938 0 1 2 2 3 4 5
14、6 7 839 0 0 3 4 5 7 8 940 0 1 2 3 5 5 741 0 0 242 14344 9histograms1. divide the data set into class intervals of equal size2. count the class frequency or calculate the class relative frequency3. conduct the histogram1. determine range2. compute class intervals (width)3. select number of classesn u
15、sually between 5 & 15 inclusive4. determine class boundaries (limits)5. count observations & assign to classesclassify分類 the quantitative datahistogrammpgfrequencyhistogramo the effect of the size of a data set on the outline of a histogramexercise1.calculate the number of the 500 measurements falli
16、ng into each of the measurement classes. then graph a frequency histogram for these data.measurement classrelative frequency0.5-2.52.5-4.54.5-6.56.5-8.58.5-10.510.5-12.512.5-14.514.5-5summation notation總和符號 summation is something that is done quite often in mathema
17、tics, and there is a symbol that means summation. that symbol is the capital greek letter sigma, and so the notation is sometimes called sigma notation instead of summation notation. numerical數(shù)值 methods for describing quantitative datao two most important data characteristics: 1. central tendency: t
18、he tendency of the data to cluster, or centre.2. variability: the dispersion or spread of the data. central tendency集中趨勢1. mean2. median3. modedef 2.4 the mean (arithmetic mean) of a set of data is the sum of the measurements divided by the number of measurements contained in the data set.the mean (
19、average) of a set of data (a sample), x1, x2, ., xn, is defined byniixnxofnumberxallofsumx11mean 平均數(shù)mean exampleraw data:xmean example (epagas)o the mean gas mileage for the 100 cars is994.361009 .441 .42.5 .328 .310 .301001001iixxmedian 中位數(shù)o def 2.5 the median of a quantitative
20、data set is the middle value of the data ranked in ascending (or descending) order. 1. position of median in sequence2. medianraw data: 24.1 22.6 21.5 23.7 22.6ordered:21.5 22.6 22.6 23.7 24.1position:12345median example odd-sized sampleraw data: 10.3 4.9 8.9 11.7 6.3 7.7ordered:4.9 6.3 7.7 8.9 10.3
21、11.7position:123456median example even-sized samplemedian example (epagas)o the median is 37.0.o the sample size n = 100 is an even number and there are two middle values located at 50th and 51st positions after ordering. these two middle values are 37.0 and 37.0 and so, the median is 37.0.o this va
22、lue implies that about half of the 100 mileages in the data set fall below 37.0 and half lie above 37.0.comparing the mean and mediano a data set said to be skewed斜的斜的 if one tail of the distribution has more extreme observations than other tail.1. if the data set is skewed to the right, then the me
23、dian is less than the mean.2. if the data set is symmetric對稱, then the median is equal to the mean. 3. if the data set is skewed to the left, then the median is larger than the mean.o def 2.6 the mode is the measurement that occurs most frequently in the data set.1. may be no mode or several modes2.
24、 may be used for quantitative & qualitative datamode 重?cái)?shù)raw data: 10.3 4.9 8.9 11.7 6.3 7.7raw data: 6.3 4.9 8.9 6.3 4.9 4.9raw data: 212828414343mode examplemode examplee.g. consider the grades of 18 students,b, c, b, a, f, d, b, c, bc, b, a, f, c, b, d, c, a1. grade: a b c d f2. frequency: 3 6 5 2
25、23. the mode is the grade b.mode example (epagas)o mode of the mileage rating is 37.0 (occurs most often)exercise1. calculate the mean, median, and mode for each of the following samples:a. 7, -2, 3, 3, 0, 4b. 2, 3, 5, 3, 2, 3, 4, 3, 5, 1, 2, 3, 4c. 51, 50, 47, 50, 48, 41, 59, 68, 45, 37variation va
26、riability離散程度1. range2. variance3. standard deviationrange 極差o def 2.8 the range of a quantitative data set is equal to the largest measurement minus the smallest measurement. it is the simplest measure of dispersion. range=max-mino it is very sensitive to extreme values.range exampleexample1. for t
27、he data set 11, 12, 13, 13, 13, 14, 15,2. for the data set 9, 10, 11, 13, 15, 16, 17 x= 13, and range = 15-11 = 4 x= 13, and range = 17-9 = 8 variance方差 & standard deviation標(biāo)準(zhǔn)差o the variance and standard deviation measure the spread of the data set around the mean. it is the average of the squares o
28、f the distance each measurement in the data set is from the mean of all the measurements in the data set.o note that the variance, 2, of a population is defined as niixn122)(1sample varianceo def 2.8 the sample variance, s2 for a sample of n measurements is equal to the sum of the squared distances
29、from the mean divided by (n-1).sample variance formula.)(11)(112112122nxxnxxnsniiniiniivariance examplee.g.(1) for the sample 11, 12, 13, 13, 13, 14, 15, =13 (2) for the sample 9, 10, 11, 13, 15, 16, 17 =13xx333. 13468)1315(.)1312()1311(1712222s667. 9329658)1317(.)1310()139(1712222ssample standard d
30、eviationo def 2.9 the sample standard deviation is s, the positive square root of the variance.)(11)(11121212nxxnxxnsniiniiniiinterpreting standard deviation o to understanding how the standard deviation provides a measure of variability of a data set, consider a specific data set and answer the fol
31、lowing questions. how many measurements are within 1 standard deviation of the mean? how many measurements are within 2 standard deviations of the mean?1. chebyshevs rule o let k be any positive number greater than 1. for any data set (regardless of the shape of the frequency distribution of the dat
32、a), the proportion of observations that lie within k standard deviations of the mean is at least 1-1/k2. 1,11)|(|2kforkksxxp1. chebyshevs ruleo this rule says that within two standard deviations of the mean (k = 2), ( -2s, +2s), you will always find at least 75% of the data (since 1-1/k2 = 1-1/4 = 0
33、.75) and that within three standard deviations of the mean (k = 3), ( -3s, +3s), you will always find at least 89% of the data (since 1-1/k2 = 1-1/9 = 0.89).o remarks: 1. chebyshevs rule applies to any data set, regardless of the shape of the frequency distribution of the data. 2. chebyshevs rule ca
34、n apply to sample and population as well.xxxx2. empirical rule 經(jīng)驗(yàn)法則othe empirical rule is a rule of thumb that applies to data sets with frequency distributions that are mound-shaped (bell-shaped) and symmetric. napproximately 68% of the measurements will fall within 1 standard deviation of the mean
35、, i.e., within the interval ( s, + s) for sample and ( , + ) for population.napproximately 95% of the measurements will fall within 2 standard deviations of the mean, i.e., within the interval ( 2s, + 2s) for sample and ( 2, + 2) for population.napproximately 99.7% of the measurements will fall with
36、in 3 standard deviations of the mean, i.e., within the interval ( 3s, + 3s) for sample and ( 3, + 3) for population.xxxxxxexampleexample 2.12 (pp.64-65). a manufacture of automobile batteries claims that the average length of life for its grade a battery is 60 months. however, the guarantee on this
37、brand is for just 36 months. suppose the standard deviation of the life length is known to be 10 months, and the frequency distribution of the life-length data is known to be mound-shaped.a. approximately what percentage of the manufacturers grade a batteries will last more than 50 months, assuming
38、the manufacturers claim is true?b. approximately what percentage of the manufacturers grade a batteries will last less than 40 months, assuming the manufacturers claim is true?c. suppose your battery lasts 37 months. what could you infer about the manufacturers claim?solutiona.the percentage of batt
39、eries lasting more than 50 months is approximately 84% of the batteries should have life length exceeding 50 months.b.approximately 2.5% of the batteries should fail prior to 40 months.c.if you are so unfortunate that your grade a battery fails at 37 months, you can make two inferences: either your
40、battery was one of the approximately 2.5% that fail prior to 40 months, or something about the manufacturers claim is not true. because the chances are so small that a battery fails before 40 months.numerical measures of relative standing 相對位置o measures of relative standing are numbers which indicat
41、e where a particular value lies in relation to the rest of values in a data set.1. percentiles2. z-scorepercentiles百分位 o def 2.11 suppose the measurements x1, x2, ., xn have been ranked in ascending order. the pth percentile is a number such that p% of the measurements fall below the pth percentile
42、and (100-p)% fall above it.o lower quartile (first quartile), q1 is the 25th percentile.o upper quartile (third quartile), q3 is the 75th percentile.o the median is the 50th percentile (or second quartile).procedure for calculating percentiles 1. arrange the n observations in ascending order, x(1),
43、x(2), ., x(n).2. calculate the position index k = (pn)/100, where p is the percentile of interest and n is the sample size.3a. if k is an integer, the pth percentile is (x(k)+x(k+1)/2.3b. if k is not an integer, the next integer value greater than k is the pth percentile.calculating percentilesexample. consider the following 15 ranked
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 鏟齒成形銑刀 課程設(shè)計(jì)
- 水處理課程設(shè)計(jì)氧化溝
- 課程設(shè)計(jì)一般是幾年
- 2024至2030年中國電熱式桌下洗碗碟機(jī)數(shù)據(jù)監(jiān)測研究報(bào)告
- 2024年香水膠盒項(xiàng)目可行性研究報(bào)告
- 2024年游船項(xiàng)目可行性研究報(bào)告
- 2024至2030年中國塑料充氣沙發(fā)數(shù)據(jù)監(jiān)測研究報(bào)告
- 2024年三分頻無源監(jiān)聽揚(yáng)聲器項(xiàng)目可行性研究報(bào)告
- 2024年中國長型雞眼機(jī)市場調(diào)查研究報(bào)告
- 2024年中國腰形扣鉚合機(jī)市場調(diào)查研究報(bào)告
- 公司法教案第四章公司法律制度
- 知道網(wǎng)課智慧《睡眠醫(yī)學(xué)(廣州醫(yī)科大學(xué))》測試答案
- 電氣設(shè)備故障預(yù)測與健康管理分析篇
- 中考語文知識點(diǎn)專題27 名著《紅星照耀中國》知識點(diǎn)
- 門診導(dǎo)診課件
- 河北省衡水中學(xué)2022-2023學(xué)年高一上學(xué)期綜合素質(zhì)檢測二數(shù)學(xué)試題含解析
- 教科版小學(xué)科學(xué)四年級下冊說課稿全套
- 《樹立正確的“三觀”》班會課件
- 2024年ACOG-《第一產(chǎn)程及第二產(chǎn)程管理》指南要點(diǎn)
- GB/T 43747-2024密封膠粘接性的評價(jià)膠條剝離法
- 建筑工程技術(shù)專業(yè)《裝配式建筑》課程標(biāo)準(zhǔn)
評論
0/150
提交評論