NRRF

NRRF Essay - Reading Assessment: Teachers' Opinions Versus Standardized Tests

Reading Assessment:
Teachers' Opinions Versus Standardized Tests

by Dr. Patrick Groff
NRRF Board Member & Senior Advisor

Dr. Patrick Groff, Professor of Education Emeritus San Diego State University, has published over 325 books, monographs, and journal articles and is a nationally known expert in the field of reading instruction.


It is my considered professional opinion that standardized reading tests in general are superior to teachers' judgments for determining precisely how well children in grades K-3 can read. I come to this conclusion after a lengthy career as a teacher and teacher educator, as an author of numerous publications on the development of young children's literacy,and as an analyst of reading assessment and critic of standardized reading tests.

In my judgment, there are several justifications as to why standardized reading tests are more reliable and valid measures of young students' reading ability than areteachers' opinions in this regard. A reliable test is one whose characteristics neutralize any biases and/or differences in qualifications among its administrators. Thus, a reliable test is one on which the scores students obtain will be essentially the same, regardless of who administers it. A valid test is onethat actually measures what it purports to assess.

With these distinctions in mind, I find that a standardized reading test (SRT) usually is a more adequate instrument, than is a teacher-devised appraisal of reading (TAR), for the following reasons:

  1. The norms (averages) for grade-level achievement on an SRT are established by repeated administrations of prepublication forms of the test. From these tryouts, each item in an SRT is assigned a different weight (number of points). The weight given an item depends on the percent of children who can answer the item correctly who have overall low scores on the SRT, as versus the percent of children who have overall high scores, who can do so. The items in an SRT thus are organized on a continuum ranging from the least to the most difficult. These features of SRTs contrast with the subjective and/or relatively disorganized manner in which TARs are constructed.

  2. SRTs focus more accurately on whether or not a student can precisely answer its items correctly. It thus can be said with greater certainty, than with TARs, whether a student either did, or did not properly respond to an item.

  3. The scores children obtain on a prospective SRT are compared with those they obtain on an already published SRT. If the statistical coefficient of correlation obtained between children's scores on the two tests is high, the validity of the prospective test is indicated.

    This fact, along with the reputations ofexperts in reading measurement who design SRTs, and the analytic critiques of a positive nature that are made of SRTs, attest to their validity. In TARs, by contrast, there is no such intra-test validation, teachers customarily do not sufficiently qualify as test-makers, and there are no academic critiques made of their quality.

  4. The argument that TARs are superior to SRTs, because the former are criterion-referenced (CR) tests,is not convincing. The goal of a CR reading test is to find out if students can correctly answer all its items. Additional instruction typically is provided to develop students' mastery over items that were answered incorrectly.

    The weakness discovered in CR tests is that they set the difficulty of their items at a low level of difficulty, so as to make sure that all students answer them accurately. There is tremendous pressure on teachers to exaggerate how well their students have learned to read, which the use of CRs can accommodate. It thus is not surprising to find CR tests engaged as parts of efforts to "dumb down" the school curriculum.

    Jerry Jesness, a Texas teacher, describes in detail the unremitting coercion that he faces for giving students passing grades for failing efforts and achievement (Reason Magazine, July 1999). The only possible escape for teachers from this intimidation, he convincingly argues, is for greater, not lesser use of standardized tests.

    Despite the false claims for the superiority of CR tests, the best reading assessment is one on which no student can answer all its items correctly. It is a test on which there is wide range of students' scores, distributed along a bell-shaped curve. Only then is one able to determine exactly how well a given student can read in comparison to others his/her age.

  5. The charge that SRTs are gender- and race-biased, and therefore that TARs are preferable because they do not have these shortcomings, is unproved. The accusation also appears to have an ulterior motive—to convince school districts to abandon all standardized testing. For example, advocates of the Whole Language (WL) approach to children's reading development complain that SRTs are not authenticmeasures of children's reading ability. Only bona fide WL teachers havethe capacity to measure readingability through their special TARs, it presumptuouslyis held.

  6. A further argument of WL proponents to this point is that children's reading behavior cannot be broken down into discrete reading skills. It thus supposedly is impossible to arrange these skills into a hierarchy of their difficulty for children to learn. Thus, SRTs are ruled out in favor of TARs on that score.

    The latter supposedly provide teachers with insights into the holistic thinking processes of children that are going on as they read. This highly speculative, subjective, and conjectural process has not been experimentally verified, its negative critics note.

  7. TARs often are 5-point rating scales, yes-no checklists, "running records," and other kinds of informal, subjective observation of children's oral and silent reading behavior. These tactics are inferior to SRTs in that judgments by teachers with them, as to the quality and quantity of children's reading performances, are highly impressionistic, and thus are noticeably unreliable.

  8. SRTs are rebuked as not being diagnostic, i.e., as not pinpointing specific reading disabilities that children suffer, and not indicating to teachers remedial instruction they should undertake. This is a stereotyping of SRTs, it is clear. In fact, some SRTs measure only children's abilities to read isolated words and to comprehend the literal meanings in short passages that their authors intended to convey. However, there is a wide range and variety of SRTs, among which are found those that exactly signify to teachers specific reading skills students lack, and thus imply the corrective instruction that is needed to remedy this situation.

    In light of the above discussion, I would recommend that if it is discovered the findings of SRTs and TARs are substantially different, the former information should prevail in making decisions about the status of children's reading ability. No longer should defenders of TARs be allowed to justify them in the vague, indeterminate, and self-serving terms often used to this effect. The more that this kind of language is eliminated from discussions of assessments of children's reading ability, the better.


Home | About Us | About Phonics | Resources
Research | Topics | Reading Reform | Links | Search

If you find this site useful, please support us. We rely completely on your donations! All donations are greatly appreciated. Mail your tax-deductible check (in U.S. dollars) to:

The National Right to Read Foundation
P.O. Box 560
Strasburg, VA 22657

Unless otherwise noted, you may copy and distribute any information on this site as long as The National Right to Read Foundation at www.nrrf.org is given credit. The National Right to Read Foundation is a 501(c)(3) publicly supported organization.