IRR (Inter-rater Reliability) of a COP　(Classroom Observation Protocol)—A Critical Appraisal

来源 :US-China Education Review(B) | 被引量 : 0次 | 上传用户：jiang663613

【摘要】

：

【作者】

：

Ning Rui Jill M. Feldman

【出处】

：

US-China Education Review(B)

【发表日期】

：

2012年3期

【关键词】

：

课堂气氛 IRR 信度批判性议定书评价可靠性措施可靠性评估 COP （classroom observation protocol） IRR （in

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers program. A combination of reliability measures (e.g., joint-probability of agreement, Cohen’s kappa, polychoric correlation and intra-class correlation coefficients) was selected dependent upon which were appropriate given the scale of each item set. Results indicate that most items in physical environment, cognitive demand and students’ class engagement can be assessed with moderate reliability. Items in classroom climate and instructional modes yielded mixed estimates. Recommendations were provided for possible improvement of similar instruments.
　　Keyword: COP (classroom observation protocol), IRR (inter-rater reliability), adolescent literacy, program evaluation
　　 Introduction
　　A COP (classroom observation protocol) is an instrument used to assess and measure the quality of teaching and learning in the classroom, identify how well resources and learning environment are contributing to learning and provide suggestions on areas for possible improvement and development. Notwithstanding broad potential utility of COPs, it does not suffice that the instruments simply remain consistent internally or over reasonable periods of time. Rather, for COPs to be useful for teacher PD (professional development) evaluation, it should be shown that observers of the same class session concur substantially on the degree to which the instructor’s classroom behaviors, methods and modes of interaction with students conform to a preexisting concept of what represents good teaching. In other words, observation protocols that are idiosyncratic to the observer, but not the instructor, can be limited and misleading for evaluation purposes. Unfortunately, there has been very limited documentation of the psychometric properties of even the most popular COPs currently used in evaluations of various instructional and teacher PD programs nationwide. In particular, there is little consensus about what statistical measures are best to analyze the IRR (inter-rater reliability) of this type of instruments. As funders increase demands for more rigorous government-funded evaluations of educational programs and interventions, one way that evaluators can meet these demands is by using the most appropriate statistical measures for estimating the psychometric properties of specific protocols. The present study attempts to address this issue through a critical appraisal of a COP used in a federally funded striving readers program. The COP was developed to inform implementation fidelity ratings of a school-wide PD model designed to support middle school content area teachers’ implementation of literacy strategies in ways that support the academic achievement of students who attend high poverty urban middle schools.
　　 Background
　　The SRP (Striving Readers Project) under study, situated in a large high-poverty urban school district in the South, is one of the eight programs sponsored by US Department of Education to address the needs of struggling adolescent readers and includes school-wide and targeted interventions plus rigorous evaluations of each component. The CLA (Content Literacy Academy), a school-wide PD model for content area teachers, provided 180 hours of intensive training over two years to increase teachers’ knowledge about and use of research-based reading strategies to improve students’ achievement in reading and core content areas(mathematics, science, social studies and English language arts), especially for students attending high-poverty urban middle schools. The SR-COP (Striving Readers Classroom Observation Protocol) was developed by Research for Better Schools, a non-profit research and development organization in Philadelphia, as an instrument to record and rate observations of Striving Readers’ classroom lessons as a part of the evaluation of CLA. The instrument was adapted from the CETP (Collaborative for Excellence in Teacher Preparation)—a classroom observation tool developed by Lawrenz, Huffman, and Appeldoorn (2002) at University of Minnesota.
　　The COP items were organized into seven domains:
　　(1) Physical environment;
　　(2) Materials/technology;
　　(3) Classroom climate;
　　(4) Instructional modes;
　　(5) Literacy strategies;
　　(6) Cognitive demand;
　　(7) Level of student engagement.
　　When inter-rater agreement is low, there are usually two reasons: (1) The scale is defective; and (2) Raters need to be retrained on the rating criteria. One of the main challenges of estimating reliability for SR-COP is that the items in different domains are scaled differently. For physical environment, all five items are in a 1-4 Likert-scale; for materials/technology, there are 12 dichotomously scaled (Yes/No) items; for classroom climate, there are six categorical items in a scale of 1, 2, 3, 4 and DK (do not know); for instructional mode, literacy strategies, cognitive demand and level of student engagement, observers indicated the use of specific modes of instruction, literacy strategies, levels of cognitive demand and student engagement in each of the four 10-minute intervals of the class through transcription of detailed field notes which are then used to complete the SR-COP data matrix. Except for cognitive demand and student engagement, there may be more than one strategy (each with an associated code) that the observer can choose to describe instruction in each interval.
　　 Methods
　　The SR-COP was used by 10 pairs of evaluators to collect data about classroom implementation related
　　acceptable psychometric properties that are used in instructional and PD program evaluation, we will leave the important questions about what works to the ill-informed advocates or opponents of education reform. It is a privilege to initiate studies of this kind to ensure that high-quality process and outcome measures are applied in government-funded evaluation projects that are intended to help the public make wise decisions.
　　 References
　　Cohen, J. (1960). A coefficient for agreement for nominal scales. Education and Psychological Measurement, 20, 37-46.
　　Crewson, E. C. (2001). A correction for unbalanced kappa tables. SUGI (SAS Users Group International) Paper 194-26. Retrieved July 17, 2008, from http://www2.sas.com/proceedings/sugi26/p194-26.pdf
　　Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: John Wiley.
　　Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613-619.
　　Fleiss, J. L., & Davies, M. (1982). Jackknifing functions of multinomial frequencies, with an application to a measure of concordance. American Journal of Epidemiology, 115, 841-845.
　　Kendall, M. G. (1948). Rank correlation methods. Charles Griffin & Company Limited.
　　Lawrenz, F., Huffman, D., & Appeldoorn, K. (2002). Classroom observation videotape guide. College of Education and Human Development, University of Minnesota.
　　Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.
　　Shrout, P. E., & Fleiss. J. L. (1979). Intra-class correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-427.
　　Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in Medicine, 17, 101-110.
　　Appendix A
　　Table A1
　　A Comparison of Various Measures of IRR
　　IRR measure Description Scale of items Pros Cons

其他文献

香蕉黑星病的发生与综合防治技术

香蕉是世界四大名果之一,在我省漳州地区广为栽培。近年来,由于香蕉黑星病的普遍发生,严重影响果实外观,明显降低经济效益,严重阻碍了香蕉生产的发展。为了有效地控制香蕉黑

期刊

Unisys推出3款基于Itanium2芯片的服务器

Unisys宣布推出3款采用最新；Intel Itanium2处理器的新型ES7000服务器，其中一款为ES7000／460，能够在同一机架中同时支持hanium2处理器和Intel Xeon处理器MP的混合式系统。

期刊

ITANIUM2处理器服务器XEON处理器芯片混合式系统ES支持

宏碁明年欲做全球笔记本电脑龙头

10月28日，宏碁董事长王振堂宣称，该公司今年笔记本电脑出货量会达到3000万台，明年目标是4000万台，有望超越惠普，正式登上全球笔记本电脑龙头宝座。王振堂强调，目前宏碁虽在传统笔记

期刊

轻薄笔记本电脑商用市场消费市场惠普董事长上网

“懒”——农民的种田经

湘潭市雨湖区长城乡有位‘懒”农民,平时不爱动手,却爱动脑,他从1996年开始摸索“农田免耕法”,如今,他家的农田不仅不要犁耙,而且连秧苗都不要培育,种谷直接往田里—撒即可,

期刊

Eicon Networks发布Diva Server V系列产品

期刊

EICONNetworks公司DIVASERVERV系列产品语音数据通信电话会议

中文百科网站杀出黑马进军海外市场盈利模式摆脱“水中月”渐显清晰

一直活跃在中文百科领域的主流网站——互动百科,如今要把疆土扩张到海外市场。这家号称全球最大中文百科网站的拓展行动,令业界重新审视维基应用在中国的发展

期刊

海外市场网站中文盈利模式清晰渐显水

The Underrepresentation of African American Female Students in STEM Fields： Implications for Classro

期刊

STEM非洲学生教师课堂教育统计美国大学教学实践African American female students STEM （science

迷你风来袭：LG Mini GD880耀目登场

此前横扫红点设计大奖和iF大奖的全球前三手机厂商LG,近期又力推新品。LG Mini GD880在2010年美国CES大会上惊艳亮相后,又连续斩获德国红点设计大奖、iF大奖、IDEA国际杰出设

期刊

LG迷你设计师IDEA消费市场中国市场手机CES

Researching the Effects of Frame-Focused Instruction on Second Language Acquisition

期刊

语言处理教学方法框架语言学习语言功能解释结构记忆结构语言特征frame relevance conceptual proposition

The Other Side of Teaching Assessment

期刊

教学评估教学评价高校教师评价工具学生self-teaching assessment class evaluation instrument th

IRR (Inter-rater Reliability) of a COP (Classroom Observation Protocol)—A Critical Appraisal

与本文相关的学术论文

IRR (Inter-rater Reliability) of a COP　(Classroom Observation Protocol)—A Critical Appraisal