下載本文檔
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、ESL essay raters cognitive processesPaula Winke and Hyojung LimMichigan State U This is a study of rater behaviorThis is a study of rater behaviorMy essayHow does a rater make scoring decisions? What does a rater pay attention to when rating?This is a study of rate
2、r behaviorMy essayLanguage testers need to know if construct-irrelevant variation in scores stem from how raters approach and think about a rubric. This is a study of rater behaviorMy essayEmpirical studies on raters cognitive processes are scarce (especially with analytic scoring), and findings are
3、 not consistent. Previous findingsMy essayRaters focus on different features in essays when scoring; weight the different scoring categories differently (Cumming et al., 2002; Eckes, 2008; Orr, 2002). Previous findingsMy essaySometimes they consider external features that are not even described in a
4、 rubric (Barkaoui, 2010; Lumley, 2005; Vaughan, 1991).Previous findingsMy essayRaters may have different attentional foci when scoring, and their foci may depend on the scale type (holistic vs. analytic), the raters experience (expert vs. novice rater),the raters L1 and even L2 background.The curren
5、t studyWed like to knowHow raters cognitively process (i.e., use) an analytic rubric while rating ESL essays Whether variability in processing (difference in rubric usage) is associated with lower inter-rater reliability Research QuestionsTo which parts of an analytic rubric do raters pay the most a
6、ttention (measured as total fixation duration and visit count)? Are inter-rater reliability statistics on the ponents of an analytic rubric related to the amount of attention paid to those ponents? Method9 raters, all ESL instructors in the same English-language program at a large, Midwestern univer
7、sity and native speakers of English. Each rated 40 essays (4 prompts * 10 essays). Analytic rating scale: Currently used at the language program; it is a modified version from Jacobs et al. (1981) content, organization, vocabulary, language use, and mechanicsTobii TX300 eye-tracker: The rubric was i
8、nstalled in the Tobii Studio program. Content OrganizationVocabulary Language UseMechanics Method9 raters, all ESL instructors in the same English-language program at a large, Midwestern university and native speakers of English. Each rated 40 essays (4 prompts * 10 essays). Analytic rating scale: C
9、urrently used at the language program; it is a modified version from Jacobs et al. (1981) content, organization, vocabulary, language use, and mechanicsTobii TX300 eye-tracker: The rubric was installed in the Tobii Studio program. The data collection set-up64cmRubricEssayScoreProcedureSession 1 in c
10、onference roomSession 2 in LabSession 3 in LabTwo-hour rater training sessionThe raters worked through 7 benchmark essays with Paula.Hyojung explained the procedure.Background questionnaireEye calibrationPractice rating (norming session)Block 1: 10 essaysBlock 2: 10 essays Eye calibration Practice r
11、ating (norming session)Block 3: 10 essaysBlock 4: 10 essays The dataData AnalysisTo quantify attention: total fixation duration (divided by the number of words in each category) and visit countTo observe a rating process: time to first fixation, gaze plots, and heat maps (Bax & Weir, 2012)Inter-rate
12、r reliability: the intraclass coefficient (ICC) and reliability adjusted by the Spearman-Brown prophecy formulaStatistics: the Kruskal-Wallis and Mann-Whitney (post hoc) testResultsIn general, raters read the rubric from left to right, starting from content, organization, vocabulary, language use to
13、 mechanics. Oftentimes (71 times, to be specific), mechanics were overlooked. ResultsOrganization received the most attention (in terms of fixation duration and visit count) and showed the highest inter-rater reliability; raters attended least to and agreed least on mechanics. r = .90r = .75Fixation
14、 duration (mean) in seconds with # of words controlledVisit countIntraclass CoefficientSpearman-Brown prophecy formula Content.0714.03.89.82Organization.080Vocabulary.058Language Use.052Mechanics.045Statistical resultsOrganization, Content Vocab. Lang Mechanic
15、s Vocab, Organization, Lang, Content MechanicsResultsFrom a qualitative review of the videos and heatmaps in comparison with each raters inter-rater reliability estimate, we believe that raters who agreed the most had common attentional foci, whereas those who agreed the least did not. Incongruous R
16、atersRaters 1 and 7 were found to be most incongruous, given their lowest inter-rater reliability for the total score (.45), and the second lowest reliability for content (.36) and for mechanics (.28). Because the scores for Essay 2 had the largest standard deviation, we looked at the heat maps for
17、essay 2 for raters 1 and 7. Essay 2Rater 1Essay 2Rater 7Agreeing RatersRaters 6 and 8 had the highest correlation coefficient in total scores (r=.79) as well as on the sub-scores for content (r=.75) and mechanics (r=.67). Given that the scores of Essay 8 shows the smallest standard deviation, the he
18、at maps for the essay 8 were compared between rater 6 and 9. Essay 8Rater 6Essay 8Rater 8DiscussionRaters attention and inter-rater reliabilityMore attention leads to higher inter-rater reliability with analytic scoring. ( greater care and attention decrease reliability with holistic scoring, Wolfe,
19、 1997) Those who showed higher inter-rater reliability showed similar reading patterns reading a relatively large area of the rubric, and having common patterns of attentional foci. DiscussionThe effect of the layout With an analytic scale, raters decision-making behaviors tend to operate within the
20、 scope of the given guidelines (Smith, 2000). Part of the guidelines is the order of the categories. We think that raters gave their most attention to content and organization and their least attention to mechanics because of a primacy effect.It has to do with rubric real estate. DiscussionIn Lumley
21、s (2005) study, the conventions of presentation (spelling, punctuation, script layout) received the second most attention after content, more attention than organization and grammar. In his study, the conventions of presentation came second after content in the rubric. May also be evidence of this p
22、rimacy effect.DiscussionRaters may use the rubric mainly to justify or adjust the scores for an essay on which they have already made decisions. When finishing reading an essay, raters seemed to know where the quality of the essay would fall in the grid of the analytic rubric.Those who showed higher
23、 inter-rater agreement appeared to look through more descriptors for various levels; those who didnt seemed to stick to their initial judgment. Limitations & Future DirectionsThe eye-movement data dont fully explain why raters paid more attention to certain categories or whether raters considered no
24、n-criterion features. - analysis of our stimulated-recall interview data is needed.We dont know if there was any halo effect across essays in the rating process.Information is lacking on how raters read the essays and how they went back and forth between the essays and the rating scale. We have coll
25、ected data for a second study in which both the rubric and essay are on screen, and data for a third study to investigate potential halo effects. Questions or comments?Paula Winke Hyojung Lim Notes on EssaysWe assembled a stratified sample of 40 essays from prior ESL place
26、ment tests at a large Midwestern university. We culled four sets of 10 essays, each set from one of four scoring bands (64 and below, 65-69, 70-74, and 75 and above: see supplemental material that panies the online version of this manuscript). We balanced the selection of the 40 essays equally acros
27、s four prompts as follows, with two to three essays at each score band being a response to one of these prompts:Do you think it is better for people to make their purchases online or to go shopping in stores and malls? Use specific details and examples to explain your answer. Some people say that al
28、l international students who are studying English should have an American roommate for at least one year. What is your opinion on this topic? Some employees have bosses that they really like working for, while others have bosses that they absolutely hate. What are the most important qualities of a g
29、ood boss at work, and why?If you had the choice, would you rather take a college course online or have the same class face to face with an instructor and classmates in a classroom? Use specific details and examples to explain your answer. The length of student essays was limited to one page so that raters did not need to flip over pages while rating. The order of 10 essays within each prompt set was randomized, and the order of the four prompt sets was counterbalanced across raters. A packet of 40 copied essays w
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 青春創(chuàng)造社團(tuán)打造創(chuàng)新思維計(jì)劃
- 《動(dòng)脈總論各論》課件
- 《宗苗答辯》課件
- 2022年黑龍江省雙鴨山市公開(kāi)招聘警務(wù)輔助人員輔警筆試自考題1卷含答案
- 2021年陜西省榆林市公開(kāi)招聘警務(wù)輔助人員輔警筆試自考題1卷含答案
- 2022年廣西壯族自治區(qū)賀州市公開(kāi)招聘警務(wù)輔助人員輔警筆試自考題2卷含答案
- 實(shí)證護(hù)理讀書(shū)報(bào)告撰寫(xiě)格式
- 江西省九江市(2024年-2025年小學(xué)六年級(jí)語(yǔ)文)部編版小升初真題(上學(xué)期)試卷及答案
- 2024年藥用粉碎機(jī)械項(xiàng)目資金申請(qǐng)報(bào)告
- 2024年化學(xué)陶瓷化學(xué)品項(xiàng)目投資申請(qǐng)報(bào)告代可行性研究報(bào)告
- 2024-2030年中國(guó)高密度聚乙烯管道行業(yè)發(fā)展展望與投資策略建議報(bào)告
- 2024-2030年中國(guó)醋酸乙烯行業(yè)運(yùn)營(yíng)狀況與發(fā)展風(fēng)險(xiǎn)評(píng)估報(bào)告
- 企業(yè)文化塑造與員工激勵(lì)方案
- 2024年01月22504學(xué)前兒童科學(xué)教育活動(dòng)指導(dǎo)期末試題答案
- 多發(fā)性神經(jīng)病護(hù)理
- 【MOOC】線性代數(shù)-浙江大學(xué) 中國(guó)大學(xué)慕課MOOC答案
- 開(kāi)門紅包費(fèi)用申請(qǐng)
- 區(qū)塊鏈原理與實(shí)踐全套完整教學(xué)課件
- 運(yùn)動(dòng)神經(jīng)元病小講課
- 工會(huì)的財(cái)務(wù)管理制度〔13篇〕
- 新版醫(yī)務(wù)人員法律法規(guī)知識(shí)培訓(xùn)課件
評(píng)論
0/150
提交評(píng)論