Guest Blogger: Dr. William L. Heller, Using Data Program Director, Teaching Matters*

Data-savvy investigators never make important decisions based on a single source. When teams following the Using Data process believe they may have found a student learning problem, based on their analysis of standardized testing results, they know to confirm the problem through an examination of student work and other common formative assessments. When they do this, it’s important for them to have a norming process in place to ensure that group of people looking at large scoring checklist with multiple scoring options presented and a large red pencil ready to select the right checkboxthe data being generated is reliable and useful.

Norming is the process of calibrating the use of a single set of scoring criteria among multiple scorers. If norming is successful, a particular piece of work should receive the same score regardless of who is scoring it. With the advent of the Common Core State Standards Initiative, we may anticipate that curriculum-embedded performance tasks will begin to gain prominence over traditional multiple-choice tests, and it will be even more important for teachers to be aware of how to make the best use of these assessments. Whether or not they are rigorous about norming can make a very big difference.

Many years ago, I was an open-ended response scorer for the New Jersey State High School Proficiency Exam, a test students had to pass in order to graduate. My fellow scorers and I were trained on, and given a qualifying exam for, each question we scored. The exam consisted of twenty sample responses to that question.  If we gave nineteen of them the correct score, we were cleared to work on that question. Once on the job, responses would show up on a computer screen (with no names, so it would be blind to gender and ethnicity), and we would type the numerical score on our keypads. Each response was graded by two scorers independently. If the two disagreed, it would get bumped up to a supervisor. We were evaluated by volume, and by how few times we were overturned. It was an incredibly efficient and reliable system.

Compare this process to the way the writing sections are currently scored on the New York State English Language Arts (ELA) Exam. Different sections cup overturned, five dice scattered on a table each showing six dotsof the state have different norming procedures, which means the state as a whole has none. I’ve talked with many New York City teachers who have scored the exam, and they report that there was very little effort to norm. Different scorers had wildly different standards for interpreting the rubric, and even the same scorer could become more lenient as the days went on. The final scores, then, were as much of a function of geography, timing, and luck as they were of student performance. How can we possibly make use of this data to reliably identify student learning problems, let alone make high-stakes decisions about school, teacher, or student performance?

Teacher teams have the opportunity to be smarter than this in the way they score their local assessments. Before any rubric-based scoring begins, the teachers involved should meet. They should each score the same piece of student work using a common rubric. They may then compare their scores, and use the comparison to guide a conversation about how the rubric will be used. Three such rounds can be fit comfortably within a common planning period. The goal is for the teachers to align their scoring practices with one another, so that scoring will be consistent and fair.

Norming can often be dismissed as extra work for an already-busy department. But without it, performance-based assessments will not yield reliable data. It’s good form to norm!

*Teaching Matters is a non-profit organization that partners with educators to ensure that all students can succeed in the digital age. They are an official TERC Using Data partner organization, conducting the Using Data for Meaningful Change institute for New York City schools.