市場研究中的統(tǒng)計技術(shù)培訓(xùn)資料_第1頁
市場研究中的統(tǒng)計技術(shù)培訓(xùn)資料_第2頁
市場研究中的統(tǒng)計技術(shù)培訓(xùn)資料_第3頁
市場研究中的統(tǒng)計技術(shù)培訓(xùn)資料_第4頁
市場研究中的統(tǒng)計技術(shù)培訓(xùn)資料_第5頁
已閱讀5頁,還剩62頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

Copyright 2008 CIIC &COMR - 1 - Statistics is a diverse body of theory and application which ranges from simple averages to complex statistical modeling and multivariate analysis. It plays a key role in Marketing Research from project design through analysis and interpretation of data. Background Copyright 2008 CIIC &COMR - 2 - Background Though frequently seen as an esoteric and intimidating discipline, the basic concepts in statistics require no more than high school algebra to master. And, like many other skills, hands-on experience is the best teacher (!). Our focus in this seminar will be on fundamental concepts and methods which have the widest application in marketing research. We will avoid complex theory and aim to provide the background necessary for you to begin applying these concepts on the job. Copyright 2008 CIIC &COMR - 3 - To provide a practical working knowledge of fundamental statistical concepts and methods in order to: 1) help you to analyze and interpret quantitative Marketing Research data; and, 2) to broaden client service skills. Objectives: Copyright 2008 CIIC &COMR - 4 - I. Basic Definitions II. Sampling III. Types of Data IV. Summarizing Data V. Inferential Statistics Syllabus: Copyright 2008 CIIC &COMR - 5 - Variable A quantity which is free to vary, e.g., purchase interest rating Univariate Analysis The investigation of one variable at a time, e.g., mean purchase interest rating in a product test Bivariate Analysis The investigation of the relationship between two variables, e.g., the correlation between an attribute rating and purchase interest I. Basic Definitions Copyright 2008 CIIC &COMR - 6 - Multivariate Analysis The investigation of the interrelationships among several variables, e.g., the joint relationship between 15 attribute ratings and purchase interest Population (universe) All objects (e.g., consumers) in the group of interest ex: - All male beer drinkers in their 30s living in Japan - All Japanese housewives aged 25-49 who have purchased canned condensed soup in the past 3 months Copyright 2008 CIIC &COMR - 7 - Sample Selected subset of population ex: - 200 male beer drinkers in their 30s - 500 housewives aged 25-49 who have purchased canned condensed soup in the past 3 months Census and Sample Survey A census is the gathering of information about all members of a population (e.g., survey of all ACNielsen employees). A sample survey is the gathering of information about a selected subset of the population (e.g., random sample of 400 ACNielsen employees). Copyright 2008 CIIC &COMR - 8 - Sampling Error The deviation of a figure obtained from a sample from the true (i.e., population) value Inferential Statistics Used to generalize or make inferences about a population from a sample. ex: A market research survey of 500 housewives aged 25-49 finds that they are more frequent buyers of dry soup than of condensed soup; how likely is this to be true for all housewives aged 25-49 in Japan? Copyright 2008 CIIC &COMR - 9 - Parameters and Statistics Parameters are numbers used to describe a population Numbers used to describe a sample are called statistics Objects/Cases In Marketing Research, these usually refer to respondents, sometimes brands. Raw Data and Aggregate Data Raw data are case or object level data, e.g., data for each respondent. Aggregate data are data which have been grouped in some way, and often consist of percentages and means. Copyright 2008 CIIC &COMR - 10 - Independent and Dependent Variables An independent variable may sometimes be viewed as a cause and a dependent variable as an effect. More generally, independent variables (e.g., age, income) are used to better understand or predict dependent variables (e.g., purchase likelihood). ex: Purchase interest in a new product concept differs by respondent age; in this example, age is the independent variable and purchase interest the dependent variable since age could effect purchase interest but not the other way around. Copyright 2008 CIIC &COMR - 11 - Statistical Notation Some of the notational conventions used in statistics are: N: number of objects/cases (e.g., respondents) in the population n: number of objects in the sample (i.e., sample size) Pi: the Greek letter pie; percentage or proportion corresponding to group i : the Greek letter mu; the population arithmetic mean X: the sample arithmetic mean (X bar) : the Greek letter sigma; population standard deviation s: sample standard deviation : the Greek capital letter sigma; the sum of a series of numbers * computer symbol for multiplication: a*b = a x b Copyright 2008 CIIC &COMR - 12 - III. Types of Data There are two main types of data which can be further subdivided into two categories each. Different types of statistical procedures are appropriate for each type of data. Non-metric - nominal - ordinal Metric - interval - ratio A basic understanding of the concepts which follow is important for good questionnaire design. Copyright 2008 CIIC &COMR - 13 - Nominal The lowest form of data in terms the information it provides. Examples are male/female, Tokyo/Osaka, user/non-user. No ranking or order of data is presumed (e.g., frequency of use). Usually expressed in percentages or frequencies. Ordinal An ordered category. Ordinal data indicates whether an object has more or less of a characteristic than another object, but not how much more. Examples are age groups and heavy/medium/light usage of a product category. Medians, ranks and percentiles can be computed on ordinal data. Copyright 2008 CIIC &COMR - 14 - Interval Data are measured in constant units. An example is a numeric rating scale, where 5-4 = 4-3 = 3-2 = 2-1. The unit of measurement is 1. There is no true zero, however; Fahrenheit and Celsius temperature scales are interval. One cannot say that 50 is twice as hot as 25 because the zero on either scale is arbitrary. ex: 30 Celsius = 86 Fahrenheit 15 Celsius = 59 Fahrenheit, not 43 Copyright 2008 CIIC &COMR - 15 - In actuality, most rating scales used in Marketing Research lie somewhere between ordinal and interval. For example, can we really say that the difference between Very much want to buy and Want to buy” is the same as the difference between Want to buy and Cant say either way? If the data are judged reasonably close to being interval, it is acceptable to compute means and to treat them as interval for analysis. Statistical procedures designed for ordinal data (non- parametric) and those designed for interval data (parametric) frequently yield similar results. Copyright 2008 CIIC &COMR - 16 - Ratio This is the highest form of data. Possesses all properties of interval data and has a true zero. A Kelvin scale is ratio; so are age and income when not categorized. Copyright 2008 CIIC &COMR - 17 - Some Guidelines 1) Many significance tests, such as the t-test, assume that the data are interval or ratio. While it is not uncommon in practice to employ these methods when the data are non-metric, strictly speaking, non-parametric tests such as the Kruskal-Wallis test are more appropriate. Copyright 2008 CIIC &COMR - 18 - 2) Weights are often assigned to frequency of usage/purchase data in order to compute means, as in the example below: Frequency consume coffee Weight twice a day or more (3.0) once a day (1.0) 3-5 times a week (0.3) 1-2 times a week (0.2) less than once a week (0.1) Weights such as these are often quite arbitrary and the resulting means only rough approximations. In such cases, it may be preferable to treat the data as ordinal rather than interval. Copyright 2008 CIIC &COMR - 19 - IV. Summarizing Data Frequency Distribution One of the most useful means of summarizing data. Can be represented in tabular form, e.g., Respondent Age n % Teens 75 15 20s 100 20 30s 125 25 40s 100 20 50s 75 15 60s 25 5 Total 500 100 or in graphic form. Copyright 2008 CIIC &COMR - 20 - 051015202530T e e n s 2 0 s 3 0 s 4 0 s 5 0 s 6 0 sHistogram Copyright 2008 CIIC &COMR - 21 - Shape of Frequency Distribution Normal Distribution This plays a key role in many statistics. Many parametric inferential statistics assume the population is at least approximately normally distributed. Severe departures from normality can invalidate descriptive statistics such as means and standard deviations. Copyright 2008 CIIC &COMR - 22 - Example of a Normal Distribution 1 Copyright 2008 CIIC &COMR - 23 - Example of a Normal Distribution 2 Copyright 2008 CIIC &COMR - 24 - Example of a Normal Distribution 3 Copyright 2008 CIIC &COMR - 25 - Departures from Normality Skewness When a distribution is asymmetrical, it is skewed. If the distribution leans to the left and the longer tail points to the right, it is positively skewed. On the other hand, if it leans to the right and the longer tail points to the left, it is negatively skewed. Copyright 2008 CIIC &COMR - 26 - Positively Skewed Distribution Copyright 2008 CIIC &COMR - 27 - Negatively Skewed Distribution Copyright 2008 CIIC &COMR - 28 - Kurtosis When the tails are unusually fat or unusually thin, the distribution is said to be kurtotic. Copyright 2008 CIIC &COMR - 29 - Example of Platykurtic Distribution Copyright 2008 CIIC &COMR - 30 - Example of Leptokurtic Distribution Copyright 2008 CIIC &COMR - 31 - Measures of Central Location (Center of Data) Averages 3 kinds of averages are typically used in Marketing Research: - arithmetic mean - median - mode If a distribution is symmetrical, the mean, median and mode are all the same. Copyright 2008 CIIC &COMR - 32 - Example of Symetrical Distribution mean, median, mode Copyright 2008 CIIC &COMR - 33 - Example of Asymetrical Positively Skewed Distribution mode median mean Copyright 2008 CIIC &COMR - 34 - Arithmetic Mean The most commonly-used average for metric data; it is calculated as follows: X = X n ex: The mean of 5, 2, 1, 3 is 5 + 2 + 1 + 3/4 = 2.75 Copyright 2008 CIIC &COMR - 35 - Means of grouped data may also be estimated by using the following formula: X*W/W, Where X is the interval midpoint and w (weight) is the frequency or (percent). Copyright 2008 CIIC &COMR - 36 - ex: Respondent Age x n % Teens 14.5 75 15 20s 24.5 100 20 30s 34.5 125 25 40s 44.5 100 20 50s 54.5 75 15 60s 64.5 25 5 Total 500 100 Copyright 2008 CIIC &COMR - 37 - X = (75*14.5) + (100*24.5) + (125*34.5) + (100*44.5) + (75*54.5) + (25*64.5)/500 = 36years Or, if percentages are used as the weights: X = (15*14.5) + (20*24.5) + (25*34.5) + (20*44.5) + (15*54.5) + (5*64.5)/100 = 36 years Copyright 2008 CIIC &COMR - 38 - 2 drawbacks of the arithmetic mean, however, are: It is sensitive to extreme values (outliers), especially when the number of data points is small. In the earlier hypothetical series of numbers (5, 2, 1, 3), 5 appears to be an outlier. If 3 is substituted for 5, the mean of these numbers decreases from 2.75 to 2.25: 3 + 2 + 1 + 3/4 = 2.25 A second disadvantage is more general; the mean may be misleading when the data distribution is non-normal. Copyright 2008 CIIC &COMR - 39 - Example of Bi-Modal Distribution mode mean mode median Copyright 2008 CIIC &COMR - 40 - Example of Uniform Distribution mean, median, mode identical Copyright 2008 CIIC &COMR - 41 - Example of U-Shaped Distribution mode mean mode median Copyright 2008 CIIC &COMR - 42 - Median The middle value of ordered data. Appropriate for ordinal, interval, and ratio data. Computational Procedure: First, rank the data from smallest value to largest value. Then, find the position of the middle value with the following formula: X = n + 1 2 where n is the number of data points. Copyright 2008 CIIC &COMR - 43 - ex: There are 11 data points (numbers) in the following ranked data set: 6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19 The median is the sixth largest (or smallest) number (11 + 1/2 = 6) : 6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19 Note that the above data set has an odd number of values (n=11). When there is an even number of data points, there will be two middle values. In these instances, the median is the arithmetic mean of the two middle values. Copyright 2008 CIIC &COMR - 44 - ex: Consider the median of the following data set with 12 data points: 6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19, 30. The median lies between the 6th and 7th largest (or smallest) value (12 + 1/2 = 6.5): 6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19, 30 or 11.5 Notice that the addition of an outlier (30) had little impact on the median. By contrast, the mean of the first data set is 11.64 and the mean of the second data set is 13.2. This is a major reason for using the median rather than the arithmetic mean. Copyright 2008 CIIC &COMR - 45 - However, the median uses less information about the data than does the mean and is often less in formative. Statistical procedures (non-parametric methods) developed for the median are generally less flexible and informative than those which analyze means. Medians can also be calculated from grouped data. Copyright 2008 CIIC &COMR - 46 - ex: Respondent Age x n(w) % Teens 14.5 75 15 20s 24.5 100 20 30s 34.5 125 25 40s 44.5 100 20 50s 54.5 75 15 60s 64.5 25 5 Total 500 100 There is an even number of interval midpoints (6), thus the median must lie somewhere between the 3rd and 4th values: n + 1/2 = 3.5; 34.5 and 44.5 are the two middle categories. Copyright 2008 CIIC &COMR - 47 - The (weighted) mean of 34.5 and 44.5 is (125*34.5) + (100*44.5)/225 or 38.9. Copyright 2008 CIIC &COMR - 48 - Mode The mode is closest in meaning to the laymanss term, average - that is typical. It is very commonly used in Marketing Research but rarely referred to by name. It is simply the most frequent value. ex: Brand X (33%) leads in terms of P3M purchase, followed by brand Y (21%) and Brand Z (17%). The mode of these data is 33%. ex: The mode of the following data set, 6, 9, 10, 11, 15, 16, 16, 16, 20 is 16. Copyright 2008 CIIC &COMR - 49 - Modes can also be obtained from grouped data. ex: Respondent Age x n(w) % Teens 14.5 75 15 20s 24.5 100 20 30s 34.5 125 25 40s 44.5 100 20 50s 54.5 75 15 60s 64.5 25 5 Total 500 100 Here, the modal age group is 30-39 and the modal age is the midpoint 34.5 of this age range. Copyright 2008 CIIC &COMR - 50 - The major disadvantages of the mode are: - It does not lend itself well to inferential statistical methods. - Sometimes, there is no distinct mode. Consider the following data: Brand P1M Purchase A 29% B 28% C 27% There is no meaningful difference in P1M purchase among the three brands. Copyright 2008 CIIC &COMR - 51 - Measures of Dispersion (Spread of Data) The most commonly-used measures of dispersion are: - Variance - Standard Deviation - Range - Percentiles, Quartiles, Quintiles, Terciles (Ntiles) - Inter-quartile range Copyright 2008 CIIC &COMR - 52 - Variance and Standard Deviation Two of the most widely-used statistics and play a role in most parametric statistical procedures. They are related to one another in the following way: the standard deviation is the square root of the variance and the variance the square of the standard deviation. Copyright 2008 CIIC &COMR - 53 - Formula: For Sample For Population Variance: S = (X - X) = (X - m) n-1 N Standard Deviation: S = (X - X) = (X - m) n-1 N S = S, = Note that the term n-1 for the sample statistics is known as the degrees of freedom. Copyright 2008 CIIC &COMR - 54 - ex: The standard deviation of the hypothetical sample data below is computed as follows: 9, 8, 5, 11, 7, 5 Compute mean: X = 9 + 8 + 5 + 11 + 7 + 5 = 7.5 6 Copyright 2008 CIIC &COMR - 55 - Calculate squared deviations from the mean: X X X-X (X-X) 9 7.5 1.5 2.25 8 7.5 0.5 0.25 5 7.5 -2.5 6.25 11 7.5 3.5 12.25 7 7.5 0.5 0.25 5 7.5 -2.5 6.25 45 0 27.50 Copyright 2008 CIIC &COMR - 56 - Substitute (X - X) and n into formula: S = 27.5 = 2.3 6-1 S = 2.3 = 5.29 The term (X - X) is known as the sum of squares. Copyright 2008 CIIC &COMR - 57 - Some Uses of Standard Deviation and Variance Averages tell us about the center of a distribution, but nothing about its spread. In a peaked distribution, most observations will fall close to the average. In a flat distribution, on the other hand, the average may have little meaning. If the distribution of the data is approximately normal, about 68% of the observations lie within +1 standard deviation of the mean and about 95% within + 1.96 standard deviations of the mean. Copyright 2008 CIIC &COMR - 58 - Z scores (“standard scores”) can be computed so that different types of scales can be compared. For example, ratings collected from a 5-point scale and those collected from a 7-point scale can be analyzed by expressing each respondents ratings in terms of standard deviation units from the mean. This is typically done in factor and cluster analysis, for example. Formula for z Scores: z score = X - X S Copyright 2008 CIIC &COMR - 59 - Areas Under Normal

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論