Starting in the first chapter, the book challenges many assumptions on which traditional classroom assessment practices are based. These challenges are designed to expose the sometimes illogical assumptions and lack of utility of many current assessment practices based on the traditional paradigm of assessment, such as averaging scores from all assessments to come up with an overall grade. However, the central purpose of the book is not to cause havoc with current classroom assessment practices, but rather to improve and augment those practices with techniques that result in more precise information about students’ status and growth on specific topics.
One major theme in the book is that effective assessment begins with clarity regarding the content that will be the focus of instruction and assessment. To this end, we strongly recommend the use of proficiency scales to define specific learning goals (also known as learning targets) and various levels of proficiency relative to those goals. Another theme in the book is that classroom teachers should never rely on a single assessment to determine a student’s status at any point in time. Rather, teachers should consider the pattern of scores on specific topics for individual students across multiple assessments. A third theme is that teachers should expand what they consider to be viable forms of assessments. Indeed, we make the case that anything a teacher does that provides information about a particular student’s status relative to a particular topic should be considered an assessment. The traditional test, then, is one form of assessment among many other forms including observations, conversations with students, short written responses, and student-generated assessments. In effect, teachers should test less (use pencil-and-paper tests less) but assess more (use a variety of ways to collect assessment information). Still another theme is that the process of assessment should be intimately tied to the process of instruction. Finally, assessment should be experienced by students as one of the most useful tools they have to increase their learning.
This book is not your ordinary classroom assessment textbook. We recommend that teams of teachers use it to systematically examine and change their assessment practices. We firmly believe that adherence to the suggestions and principles articulated in this book will create a paradigm shift in classroom assessment whose time has definitely come.
INTRODUCTION
The New Paradigm for Classroom Assessment
This book is about a paradigm shift in the way teachers use and interpret assessments in the classroom. It is also about increasing the rigor and utility of classroom assessments to a point where educators view them as a vital part of a system of assessments that they can use to judge the status and growth of individual students. This is a critical point. If we are to assess students in the most accurate and useful ways, then we must think in terms of merging the information from classroom assessments with other types of assessments. Figure I.1 shows the complete system of assessments a school should use.
Source: Marzano, 2018, p. 6.
Figure I.1: The three systems of assessment.
Perhaps the most visible of the three types of assessments in figure I.1 is year-end assessments. M. Christine Schneider, Karla L. Egan, and Marc W. Julian (2013) describe year-end assessments as follows:
States administer year-end assessments to gauge how well schools and districts are performing with respect to the state standards. These tests are broad in scope because test content is cumulative and sampled across the state-level content standards to support inferences regarding how much a student can do in relation to all of the state standards. Simply stated, these are summative tests. The term year-end assessment can be a misnomer because these assessments are sometimes administered toward the end of the year, usually March or April and sometimes during the first semester of the school year. (p. 59)
The next level of assessment in the model in figure I.1 is interim assessments. Schneider and colleagues (2013) describe them as follows: “Interim assessments (sometimes referred to as benchmark assessments) are standardized, periodic assessments of students throughout a school year or subject course” (p. 58).
Professional test makers typically design both types of assessments, and they include the psychometric properties educators associate with high reliability and validity, as defined in large-scale assessment theory. As its name indicates, large-scale assessment theory focuses on tests that are administered to large groups of students like year-end state tests. As we indicate in figure I.1, the most frequent type of assessment is classroom assessment. Unfortunately, some educators assume they can’t use classroom assessments to make decisions about individual students because the assessments do not exhibit the same psychometric properties as the externally designed assessments. While this observation has intuitive appeal, it is actually misleading; in this book we assert that classroom assessments can actually be more precise than external assessments when it comes to examining the performance of individual students.
This chapter outlines the facts supporting our position. In the remaining chapters, we fill in the details about how educators can design and use classroom assessments to fulfill their considerable promise.
It is important to remember that all three types of assessment we depict in figure I.1 have important roles in the overall process of assessing students. To be clear, we are not arguing that educators should discontinue or discount year-end and interim assessments in lieu of classroom assessments. We are asserting that of the three types of assessment, classroom assessments should be the most important source of information regarding the status and growth of individual students.
We begin by discussing the precision of externally designed assessments.
The Precision of Externally Designed Assessments
Externally designed assessments, like year-end and interim assessments, typically follow the tenets of classical test theory (CTT), which dates back at least to the early 1900s (see Thorndike, 1904). At its core, CTT proposes that all assessments contain a certain degree of error, as the following equation shows.
Observed Score = True Score + Error Score
This equation indicates that the score a test taker receives (the observed score) on any type of assessment comprises two components—a true component and an error component. The true component (the true score) is what a test taker would receive under ideal conditions—the test is perfectly designed and the situation in which students take the test is optimal. The error component (the error score) is also a part of the observed score. This component represents factors that can artificially inflate or deflate the observed score. For example, the test taker might guess correctly on a number of items that would artificially inflate the observed score, or the test taker might misinterpret a few items for which he or she actually knows the correct answers, which would artificially deflate the observed score.
Probably the most important aspect of error is that it makes observed scores imprecise to at least some degree. Stated differently, the scores that any assessment generates will always contain some amount of error. Test makers report the amount of error one can expect in the scores on a specific test as a reliability coefficient. Such coefficients range from a low of 0.00 to a high of 1.00. A reliability of 0.00 indicates that the scores that a specific assessment produces are nothing but error. If students took the same test again sometime after they had completed it