1、Doing Synthesis and Meta-Analysis in Applied LinguisticsLourdes OrtegaUniversity of Hawaii at MnoaNational Tsing Hua UniversityTaiwan, June 8, 2011Please cite as:Ortega, L. (2011). Doing synthesis and meta-analysis in applied linguistics. Invited workshop at Tsing Hua University, Taipei, June 8, 201

2、1.Copyright Lourdes Ortega, 2011Research synthesis(including meta-analysis)What is it?Why do it?How do we do it?An exampleChallenges?Value?What isresearch synthesis?The reviewing continuumS e c o n d a r y R e s e a r c hNarrative .Systematic.SYNTHESISLIT REVIEWMETA-ANALYSISSo, what is meta-analysis

3、, specifically?one specific kind of research synthesisSecondary analysis of quantitative analysesEach primary study is a data pointGoal: what are the main effects or relationships found across many studies?Strictly speaking, only quantitative studies applyWhy do it?have lead to unending debates:What

4、 does the evidence “say”? According to whom? How do we know who is right?Traditional literature reviewse.g.: error correction (Ferris vs. Truscott)e.g.: Critical Period Hypothesis(Hyltenstam et al.vs. Birdsong)Typical strategies of traditional reviews?Tables summarizingmany studiese.g. from Krashen

5、et al. (1979):Vote-counting techniquee.g.: Error correction in L2 writingLimitations:No specific set of methods, up to mysterious expertise Experts are always vested, therefore vulnerable to charge of biasStatistical significance has serious pitfallsIdiosyncratic methodologyEvidentiary warrants diff

6、icult to judgeOver-reliance on statistical significance (but magnitude, not just generalizability, is of interest to social scientists!)What does the evidence “say”? According to whom? How do we know who is right?Methods for reviewing, from “art” into “science”: Systematic, not arbitraryMore than th

7、e sum of the partsReplicableSOLUTION in the late 1970sSecondary, yes.but empirically accountable, & discovering newtruths in old dataHow do wedo it?Norris & Ortega (2006a, 2006b)Norris, J. M., & Ortega, L. (2010). Timeline: Research synthesis. Language Teaching, 43, 461-479.Ortega, L. (2010). Resear

8、ch synthesis. In B. Paltridge & A. Phakiti (Eds.), Companion to research methods in applied linguistics (pp. 111-126). London: Continuum.Norris, J. M. (2012). Meta-analysis. In C. Chapelle (Ed.), Encyclopedia of applied linguistics. Malden, MA: Wiley.Norris, J. M., & Ortega, L. (2007). The future of

9、 research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.1. Principled selection of primary studies3. Direct use of the evidence reported (not the authors interpretations) across studiesWhat are the definitional features of all syntheses (including meta-analyse

10、s)?2. Systematic coding of each study for main variables 1. Principled selection of studiesSampling is central to empirical researchwhat population are we trying to understand?RandomexperimentalPurposivequalitativeSampling is central to synthesis, as wellCompletesecondary research should be basedon

11、the full universe of studies that have investigated the same thingSearch & Retrieval of LiteratureThe literature search is a key step in systematic synthesis (some direction: Innami & Koizumi, 2010)identify all studies that are relevantExhaustiveelectronic, hand,footnote chasinginvisible collegeRepl

12、icablefully explained in report1st electronic searches2nd other techniques:Manual searches of journalsFootnote chasingForward searches with Web of ScienceWebsite searches of key contributing scholarsPolite email requests to authors & expertsInclusion & Exclusion criteriaAll potentially relevant stud

13、ies must then be examined to decide: Include or Exclude (“apples or oranges?”)Inclusion criteriaall criteria satisfiedExclusion criteriaexplain each reason for exclusionand give examplesFull rationale: tables, appendices,philosophy of inclusivity or selectivity1. Principled selection of studiesLiter

14、ature search +Study eligibility criteria,Inclusion/exclusionWhat are the definitional features of all syntheses (including meta-analyses)?2. Systematic coding of each studyEliciting evidence with consistency, just as when surveying, interviewing, or testing participantsAsking research questions of t

15、he literature:What variables are important?How (and how well) have they been investigated?What are the findings across studies?PublicationfeaturesSubstantivefeaturesMethodologicalfeaturese.g., How was “explicit” instruction defined?e.g., How was “l(fā)earning” measured?e.g., Means, sd, etc?Sample sizeDe

16、signReliabilityStats usedEtc.YearAuthorPublished or Fugitive?JournalBookDissertationPresentationCoding book to identify study features that answer questionsMultiple coders1. Principled selection of studies2. Systematic coding of each study for main variablesCoding book,Standardization,Intercoder rel

17、iabilityWhat are the definitional features of all syntheses (including all meta-analyses)?Record carefully what authors report and how they report it,But ultimately, analyze what the evidence they present tells us, not what they say it meansSeeking an objective view across studies of the accumulated

18、 state of knowledge3. Trust the evidence, not the authorsWhen aggregating and averaging findings is the goal, as in meta-analysis How do we compare, combine, and interpret findings across numerous quantitative studies of the same thing?effect sizes & confidence intervalsAn estimate of the magnitude

19、or strength of a quantitative finding:how much difference?how much improvement?how closely related?Effect size: What is it?Effect sizes: absolute scalesscaleStudy 1Study 21. percentExperimental group = 30% better than controlExperimental group = 20% better than control3. known measurePre-post TOEFL

20、score: 450 575Pre-post TOEFL score: 450 495Q: What happens when studies to not report findings on comparable scales?2. correlationMotivation & achievement, r = .36Motivation & achievement, r = .78d is also simple to calculate and to interpret, and it incorporates variability differences between grou

21、psEffect size d = The average of the experimental group minus the average of the control group divided by the pooled standard deviation of both groups.Effect sizes: standardizedDifference between experimental and control groups in standard deviation units (Cohens d) differenceexper.contr.No sizeable

22、 effect (d=0.10) differenceexper.contr.Very large effect (d=3.00)Effect sizes: standardizedEffect sizes for meta-analysisStudy 1Study 2Study 3Study 4Study 5 Study Study effect size 1effect size 2effect size 3effect size 4effect size 5= average effect sizeThe terms small, medium, and large are relati

23、ve, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation. (Cohen, 1988, p. 25) Interpreting effect sizes: What does d really tell us?d .30d .80The stroll from the hotel to the Uni

24、versity is, on average, 10 minutes, plus or minus 3 minutes:The average is not enoughConfidence IntervalsUpper bound=13 minutesAverage=10 minutesLower bound=7 minutes“The margin of error in an observation”95% certaintyConfidence Intervals in Meta-analysisCIs tell us about the certainty with which we

25、 can interpret an average effect size.Effect Sizes and Confidence Intervals in Meta-analysisNKMeandSDd95% CI lower95% CI upperAvg. effect of instructional treatment4998.96.87.781.14Why does it help to focus on effect sizes?Smoking up to half a pack a day (or less than 10 cigarettes) a day increases

26、the chance of mortality by 40% when compared to non-smokersSmoking two packs or more a day increases the risk of death by three times to 120% when compared to non-smokersU.S. Department of Health, Education, and Welfare Report, 1967e.g., effects of Smoking researchin the 1960sThere is a statisticall

27、y significant difference in mortality rates between smokers and non-smokers.And what about small effectscan they be important too?r = .034a truly tiny effect!Regular aspirin consumption and decrease in heart attacks = 3.4% decrease = at least 3 out of 100 who would not have a heart attack if they re

28、gularly took aspirin.d = .30a small magnitude effect!Effects of reading tutorials for underachieving students, the same for untrained peer tutoring and for highly trained teachers engaging in longer hours of tutoring. Both are important! Interpreting effect sizes: complex, contextualized, not absolu

29、te1. Principled selection of studies3. Direct use of the evidence reported (not the authors interpretations)2. Systematic coding of each study for main variablesEffect sizes,Confidence Intervals,Other kinds of new data based on oldWhat are the definitional features of all syntheses (including all me

30、ta-analyses)?How do we do it? An example ofSynthesis+meta-analysis In applied linguistics, the first full-blown synthesis and meta-analysis:Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417-528.Effects

31、 of instructionRecastsGarden pathInput enhancementInputprocessingInput floodinductiveTask-basedinteractionTraditionalgrammarConsciousness-raisingdictoglossStep 1: Problem SpecificationFocus of Norris & OrtegaL2 instructionL2 learningRQ 1&2InstructionOverall? By type?RQ 6:Quality of research practice

32、s?RQ 4:Instructional intensity?RQ 3: Effect ofoutcome measures?RQ 5:Durabilityof effects?Step 2: Literature search1st electronic searches2nd other techniques:Manual searches of 14 journalsFootnote chasing of 25 reviewsFootnote chasing of each study includedStep 3: Study eligibility criteriaPotential

33、ly relevant 250 relevant for synthesis 77 adequate for meta-analysis 49Step 4: Coding of study featuresType of instruction: FonF, FonFS, explicit, implicitType of outcome measure: metalinguistic, selected, constrained, freeIntensity of instruction: Brief (less than 1 hr), short (between 1 and 2 hrs)

34、, medium (between 3 and 6 hrs), long (more than 7 hrs)Durability of effects: effect sizes on delayed testsSteps 5 & 6: Analyze, display, interpretFindings RQ 1 & 2 (effectiveness):Findings RQ 3 (type of measure)Findings RQ 4 (intensity):Findings RQ 5 (durability):RQ 1-5 (meta-analysis part):How effe

35、ctive is L2 instruction?Clearly more effective than no instruction or only meaningful exposure to L2 d = 0.96 based on 49 studiesExplicit instruction is superior in the short term to implicit instruction d = 1.13 versus d = 0.54, based on 69 and 29 contrasts, respectivelyBut focus on form and on for

36、mS are equally effective d = 1.00 form versus 0.93 formS, based on 43 and 55 contrasts, respectivelyEffects are durable delayed post-tests from 22 studies: d = RQ6 (synthesis part):Research practicesToo many variables in a single design need to simplify designs, increase NNo pre-test (18%), no true

37、control group (83%) need to always include bothPoor reporting standards (52% no sd, 84% no instrument reliability, 57% no set alpha) editors need to demand better reportingMisuse of statistical inference (no assumptions checked or met, parametric stats on small samples, no consideration of magnitude

38、) the field needs better training in statistics if they insist on using such methodsSince thenaccumulation of meta-analysesIn 2000, when Norris & Ortega was published, there were only 2 other published systematic syntheses in applied linguistics. As of 2010, Norris & Ortega identified 23 in their Ti

39、meline, most published since 2006.Motivation: Masgoret & Gardner (2003)Interaction: Keck et al. (2006), Mackey & Goo (2007)Oral feedback: Russell & Spada (2006), Lyster & Saito (2010), Li (2010)Use of glosses in CALL: Taylor (2006 & 2009), Abraham (2008)Some challenges for research synthesis in L2 r

40、esearchWell known phenomenon, present in all the social sciences (Rosenthal, 1979; Rothstein et al., 2005)Little understood in applied linguisticsPublication bias: “file drawer problem”Include fugitive literatureCheck for publication biasThe quality of a synthesis can only be as good as the quality

41、of the primary studies that are synthesized in it.But how do we judge quality? Publication type? Methodology ratings? Exclusions?Quality: “garbage in, garbage out”Anticipate consequences of synthesisEthicsWould it prematurely close the area for research?Would it taken as a personal attack on researc

42、hers/labs?What is the potential for findings to be (mis)appropriated by audiences (policy makers, teachers, )?High-tech statistication,cookie-cutter approach“. conceptual vacuum when technical meta-analytic expertise is not coupled with deep knowledge of the theoretical and conceptual issues at stak

43、e in the research domain under review”(Norris & Ortega, 2006b, p. 37)Meta-analysis only, no interest in quantitative synthesis of other kinds/scopeNew-generation meta-analyses bypass synthesis:Li (2010)Lyster & Saito (2010)Plonsky (2011)Spada & Tomita (2010)Thomas (1994), (2006)Ortega (2003)?Yet, mu

44、ch contemporary research in applied linguistics is qualitative and increasingly more is mixed-methods both worth synthesizing!Qualitative synthesis?No interest either in exploring qualitative synthesis Only Tllez & Waxman (2006) in applied linguisticsMeta-ethnography(Noblit & Hare, 1988;see Tllez &

45、Waxman, 2006)Qualitative Comparative Analysis(Ragin, 1999)Critical Interpretive Synthesis(Dixon-Woods et al., 2006)And there are options to draw from in education, health sciences, and other fields!Value?There is huge value in systematic synthesis (including meta-analysis):Secondary research, yes. b

46、ut:Empirically accountableConceptually illuminating:discovering new truths in old dataSustained progressMuch improvement in certain reporting practices (LL, MLJ in particular)Larger N in primary studies = more trustworthy analysesUse of increasingly sophisticated techniques in meta-analysesstudy qua

47、lity criteria, weighting (by N, reliability, variance), fixed/random effects models, sensitivity analysis, fill & trim estimations, publication bias, etc.Use of meta-analytic software, e.g.: “we envision synthetic methodologies as advancing our ability to produce new knowledge by carefully building

48、upon, expanding, and transforming what has been accumulated over time . However, . all knowledge is bound by context and purpose.”(Norris & Ortega, 2006b, p. 37)But only if applied linguists cultivate“the will to synthesis”Thank YouReferencesAbraham, L. B. (2008). Computer-mediated glosses in second

49、 language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning , 21, 199-226.Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R., Miller, T., Sutton, A. J., et al. (2006). How can systematic reviews incorporate qualitative research? A critical perspecti

50、ve. Qualitative Research, 6, 27-44.Keck, C. M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating the empirical link between task-based interaction and acquisition: A meta-analysis. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (

51、pp. 91-131). Amsterdam: John Benjamins.Krashen, S., Long, M. H., & Scarcella, R. (1979). Accounting for child-adult differences in second language rate and attainment. TESOL Quarterly, 13, 573-582. Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning, 60

52、, 309-365.Lyster, R., & Saito, K. (2010). Oral feedback in classroom SLA: A meta-analysis. Studies in Second Language Acquisition, 32(2). Mackey, A., & Goo, J. M. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in second lan

53、guage acquisition: A collection of empirical studies (pp. 407-452). New York: Oxford University Press.Masgoret, A.-M., & Gardner, R. C. (2003). Attitudes, motivation, and second language learning: A meta-analysis of studies conducted by Gardner and associates. Language Learning, 53, 123-163. Noblit,

54、 G. W., & Hare, R. D. (1988). Meta-ethnography : Synthesizing qualitative studies. Newbury Park, CA: Sage.Norris, J. M. (2012). Meta-analysis. In C. Chapelle (Ed.), Encyclopedia of applied linguistics. Malden, MA: Wiley.Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research

55、synthesis and quantitative meta-analysis. Language Learning, 50, 417-528. Norris, J. M., & Ortega, L. (Eds.). (2006a). Synthesizing research on language learning and teaching. Amsterdam: John Benjamins.Norris, J. M., & Ortega, L. (2006b). The value and practice of research synthesis for language lea

56、rning and teaching. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 3-50). Amsterdam: John Benjamins.Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815. Norris, J. M., & Ortega, L. (2010). Research timeline: Research synthesis. Language Teaching, 43, 461-479. Ortega, L. (2003). Syntactic complexity measures


