IRR (Inter-rater Reliability) of a COP (Classroom Observation Protocol)—A Critical Appraisal

来源 :US-China Education Review(B) | 被引量 : 0次 | 上传用户:jiang663613
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers program. A combination of reliability measures (e.g., joint-probability of agreement, Cohen’s kappa, polychoric correlation and intra-class correlation coefficients) was selected dependent upon which were appropriate given the scale of each item set. Results indicate that most items in physical environment, cognitive demand and students’ class engagement can be assessed with moderate reliability. Items in classroom climate and instructional modes yielded mixed estimates. Recommendations were provided for possible improvement of similar instruments.
  Keyword: COP (classroom observation protocol), IRR (inter-rater reliability), adolescent literacy, program evaluation
   Introduction
  A COP (classroom observation protocol) is an instrument used to assess and measure the quality of teaching and learning in the classroom, identify how well resources and learning environment are contributing to learning and provide suggestions on areas for possible improvement and development. Notwithstanding broad potential utility of COPs, it does not suffice that the instruments simply remain consistent internally or over reasonable periods of time. Rather, for COPs to be useful for teacher PD (professional development) evaluation, it should be shown that observers of the same class session concur substantially on the degree to which the instructor’s classroom behaviors, methods and modes of interaction with students conform to a preexisting concept of what represents good teaching. In other words, observation protocols that are idiosyncratic to the observer, but not the instructor, can be limited and misleading for evaluation purposes. Unfortunately, there has been very limited documentation of the psychometric properties of even the most popular COPs currently used in evaluations of various instructional and teacher PD programs nationwide. In particular, there is little consensus about what statistical measures are best to analyze the IRR (inter-rater reliability) of this type of instruments. As funders increase demands for more rigorous government-funded evaluations of educational programs and interventions, one way that evaluators can meet these demands is by using the most appropriate statistical measures for estimating the psychometric properties of specific protocols. The present study attempts to address this issue through a critical appraisal of a COP used in a federally funded striving readers program. The COP was developed to inform implementation fidelity ratings of a school-wide PD model designed to support middle school content area teachers’ implementation of literacy strategies in ways that support the academic achievement of students who attend high poverty urban middle schools.
   Background
  The SRP (Striving Readers Project) under study, situated in a large high-poverty urban school district in the South, is one of the eight programs sponsored by US Department of Education to address the needs of struggling adolescent readers and includes school-wide and targeted interventions plus rigorous evaluations of each component. The CLA (Content Literacy Academy), a school-wide PD model for content area teachers, provided 180 hours of intensive training over two years to increase teachers’ knowledge about and use of research-based reading strategies to improve students’ achievement in reading and core content areas(mathematics, science, social studies and English language arts), especially for students attending high-poverty urban middle schools. The SR-COP (Striving Readers Classroom Observation Protocol) was developed by Research for Better Schools, a non-profit research and development organization in Philadelphia, as an instrument to record and rate observations of Striving Readers’ classroom lessons as a part of the evaluation of CLA. The instrument was adapted from the CETP (Collaborative for Excellence in Teacher Preparation)—a classroom observation tool developed by Lawrenz, Huffman, and Appeldoorn (2002) at University of Minnesota.
  The COP items were organized into seven domains:
  (1) Physical environment;
  (2) Materials/technology;
  (3) Classroom climate;
  (4) Instructional modes;
  (5) Literacy strategies;
  (6) Cognitive demand;
  (7) Level of student engagement.
  When inter-rater agreement is low, there are usually two reasons: (1) The scale is defective; and (2) Raters need to be retrained on the rating criteria. One of the main challenges of estimating reliability for SR-COP is that the items in different domains are scaled differently. For physical environment, all five items are in a 1-4 Likert-scale; for materials/technology, there are 12 dichotomously scaled (Yes/No) items; for classroom climate, there are six categorical items in a scale of 1, 2, 3, 4 and DK (do not know); for instructional mode, literacy strategies, cognitive demand and level of student engagement, observers indicated the use of specific modes of instruction, literacy strategies, levels of cognitive demand and student engagement in each of the four 10-minute intervals of the class through transcription of detailed field notes which are then used to complete the SR-COP data matrix. Except for cognitive demand and student engagement, there may be more than one strategy (each with an associated code) that the observer can choose to describe instruction in each interval.
   Methods
  The SR-COP was used by 10 pairs of evaluators to collect data about classroom implementation related
  acceptable psychometric properties that are used in instructional and PD program evaluation, we will leave the important questions about what works to the ill-informed advocates or opponents of education reform. It is a privilege to initiate studies of this kind to ensure that high-quality process and outcome measures are applied in government-funded evaluation projects that are intended to help the public make wise decisions.
   References
  Cohen, J. (1960). A coefficient for agreement for nominal scales. Education and Psychological Measurement, 20, 37-46.
  Crewson, E. C. (2001). A correction for unbalanced kappa tables. SUGI (SAS Users Group International) Paper 194-26. Retrieved July 17, 2008, from http://www2.sas.com/proceedings/sugi26/p194-26.pdf
  Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: John Wiley.
  Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613-619.
  Fleiss, J. L., & Davies, M. (1982). Jackknifing functions of multinomial frequencies, with an application to a measure of concordance. American Journal of Epidemiology, 115, 841-845.
  Kendall, M. G. (1948). Rank correlation methods. Charles Griffin & Company Limited.
  Lawrenz, F., Huffman, D., & Appeldoorn, K. (2002). Classroom observation videotape guide. College of Education and Human Development, University of Minnesota.
  Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.
  Shrout, P. E., & Fleiss. J. L. (1979). Intra-class correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-427.
  Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in Medicine, 17, 101-110.
  Appendix A
  Table A1
  A Comparison of Various Measures of IRR
  IRR measure Description Scale of items Pros Cons
其他文献
香蕉是世界四大名果之一,在我省漳州地区广为栽培。近年来,由于香蕉黑星病的普遍发生,严重影响果实外观,明显降低经济效益,严重阻碍了香蕉生产的发展。为了有效地控制香蕉黑
期刊
Unisys宣布推出3款采用最新;Intel Itanium2处理器的新型ES7000服务器,其中一款为ES7000/460,能够在同一机架中同时支持hanium2处理器和Intel Xeon处理器MP的混合式系统。
10月28日,宏碁董事长王振堂宣称,该公司今年笔记本电脑出货量会达到3000万台,明年目标是4000万台,有望超越惠普,正式登上全球笔记本电脑龙头宝座。王振堂强调,目前宏碁虽在传统笔记
湘潭市雨湖区长城乡有位‘懒”农民,平时不爱动手,却爱动脑,他从1996年开始摸索“农田免耕法”,如今,他家的农田不仅不要犁耙,而且连秧苗都不要培育,种谷直接往田里—撒即可,
期刊
一直活跃在中文百科领域的主流网站——互动百科,如今要把疆土扩张到海外市场。这家号称全球最大中文百科网站的拓展行动,令业界重新审视维基应用在中国的发展
此前横扫红点设计大奖和iF大奖的全球前三手机厂商LG,近期又力推新品。LG Mini GD880在2010年美国CES大会上惊艳亮相后,又连续斩获德国红点设计大奖、iF大奖、IDEA国际杰出设