Observations on Reading Recovery

Dr. Patrick Groff, NRRF Board Member and Senior Advisor

by Dr. Patrick Groff
NRRF Board Member & Senior Advisor
Patrick Groff, Professor of Education Emeritus, San Diego State University, has published over 300 books, essays, and journal articles and is a nationally known expert in the field of reading.

These books are evaluated:

Marie M. Clay (1993). An Observation Survey of Early Literacy Achievement. Portsmouth, NH: Heinemann. pp. 93.
Marie M. Clay (1993). Reading Recovery: A Guideline for Teachers in Training. Portsmouth, NH: Heinemann. pp. 112.
The Reading Recovery Reports, the published evidence that Clay offers for Reading Recovery’s effectiveness.
The first two volumes were written by Marie Clay, the creator of the remedial reading tutoring program called “Reading Recovery” (RR). The texts are the official guidebooks for teachers who refer children for RR, and those who aspire to become RR tutors. The two books offer (1) Clay’s version of the reading process; (2) her descriptions of how to decide which first-grade children need RR; (3) the official teaching procedures of RR; and (4) information on when to transfer (discontinue) pupils out of RR as they recuperate from their reading disabilities.

Clay’s Version of the Reading Process

Clay’s conceptualizations of the “reading process,” which she expects all RR tutors to honor and adopt, violate what the experimental evidence says about reading development. For example, Clay insists that beginning readers be engaged by their teachers (for some unspecified time) in “reading for meaning” before these pupils are taught to apply phonics information to decode written words. She maintains that beginning readers’ ability to apply phonics information is a “grossly simplified explanation” of what they “need to know or do in order to be able to read.” She scoffs at the belief that a beginning reader’s underachievement ever could be brought on by a “simple” cause “such as not having learned his phonics.”

It is unintended, but likely, that the delay in phonics teaching that Clay advises actually causes reading problems that RR is designed to remedy. It is indisputable, as Clay notes, that “the larger the chunks of printed language the child can work with” the better. But then, without support from the empirical evidence on reading development, she maintains that children’s ability to recognize individual words is not the foundation for their successful reading of larger chunks of written material.

A proper theory of the latter “cannot arise from a theory of word reading,” Clay mistakenly insists. So, she further misreckons, there are eight word recognition cues beginning readers “need to use” when perusing written material, before they begin to apply phonics information to recognize words. For example, Clay advises that these pupils “use” sentence structure, size of words, “special [unnamed] features of sound, shape and layout,” and “possible meanings of the text” for some indefinite period of time before they begin to apply phonics information.

The relevant experimental research makes clear to the contrary, however, that direct and systematic teaching of phonics skills generates more written word recognition ability for beginning readers than otherwise is possible. Nothing relates more closely to beginning readers’ quick and accurate (automatic) word recognition ability than does their skill at applying phonics information. Moreover, this automatic word recognition is associated more closely to reading comprehension of the stories first-graders typically are given to read than is any other factor–including any of the eight cues that Clay insists precede the application of phonics information.

Phonics knowledge thus is not, as Clay erroneously contends, a “less reliable” and more “confusing and distorting source of cues” for reading words than is sentence structure, the empirical evidence indicates. The “high progress readers” to whom Clay refers do not depend more heavily on sentence structure to recognize words than they do on phonics knowledge. Clay’s pronouncement that beginning readers need to be adept at “scanning” written material to gain its meaning clearly is not corroborated by scientific evidence. To the opposite, beginning readers must be weaned away from the use of sentence context cues if they are to learn to recognize individual words in a quick and accurate fashion, it indicates.

In addition, Clay wrongfully assumes that beginning readers’ “pretend writing,” the “jumble of disoriented letters” that pre-literate children produce when trying to spell words, greatly reinforces these pupils’ ability to recognize words. It is true that correctly spelled words perform this reinforcing function. As children apply phonics information to decode words, they begin to recognize familiar spelling patterns. These pupils eventually can read these spelling patterns in words without sounding them out, letter-by-letter. In this fashion, children learn to read words faster and faster. However, misspelled (“pretend writing”) words have relatively little utility in reinforcing a beginning reader’s ability to read correctly spelled words.

Finally, Clay’s inaccurate version of the reading process apparently causes her to set a relatively low standard as to the percent of first-grade children who can be expected to learn to read and write. She believes in this regard that only “most children can become literate.” This reduced level of reading attainment in children is ordained, according to Clay. We should not hope that improvements in reading methodology will make reading problems disappear, since a child’s “intelligence” supposedly predetermines if he or she will read better or worse than do other children. Unfortunately, Clay does not explain what “intelligence” means to her. She inexplicably protests the use of intelligence test data in assessments of reading disability. Nonetheless, decisions about children’s intelligence indicate to Clay that teachers are “sensitive to individual differences” in reading ability.

If a first-grade teacher faithfully implemented Clay’s version of the beginning reading process, there likely would be closer to 51 percent rather than 100 percent of children being taught who would learn to read well. Clay concedes elsewhere (in The 3 Early Detection of Reading Difficulties, 1985), that in New Zealand, where her version of the reading process (“Whole Language”) is federally mandated for reading programs, 30 to 50 percent of children read so poorly they qualify for Reading Recovery. In California, where the Whole Language approach to reading development is more popular than in any other state, students are now the least capable readers in the nation.

How children are assigned to Reading Recovery

Would-be tutors are told how to assign children to Reading Recovery in Clay’s book, An Observation Survey of Early Literacy Achievement. These RR trainees are warned emphatically that “standardized tests do not measure slow progress [in children’s learning to read] well.” Elsewhere in the book, however, Clay reverses herself noting that “several standardized tests can be applied” to measure underachievement in reading and spelling. One must put more weight in her negative criticism of standardized tests, however. Throughout the book, Clay emphasizes that to determine as quickly as possible which students should be assigned RR, first-grade teachers must conduct “systematic observation” of each of their pupils’ reading behavior, rather than to use standardized test data for this purpose. It is notable that Clay does not recommend standardized reading tests be administered when pupils enter and leave RR.

Remedial teaching should be scheduled early for the underachieving student, preferably at the beginning of the second grade, Clay reasonably argues. But, she continues, if a teacher used a standardized reading test to determine a pupil’s underachievement, the “child with reading difficulties has had to wait until the third or fourth year of school before being offered special instruction.” This conclusion obviously is uninformed and therefore misleading, since there are many readily available, well-designed, effective diagnostic standardized tests of reading that are applicable at the first-grade level. One need only consult leading textbooks on diagnostic and remedial reading instruction, or standard indexes of reading assessment, such as Mental Measurements Yearbook or Test Critiques, to be convinced of Clay’s error in this matter.

It is true, as Clay complains, that standardized visual discrimination tests have not proved themselves useful in providing guidance to teachers as to what reading instruction underachieving beginning readers need. On the other hand, she finds no support from the empirical evidence for her view that the same is true for standardized tests (a) of children’s conscious awareness of speech sounds, and (b) of phonics skills.

Instead of using standardized reading tests to determine which children are in need of Reading Recovery, Clay recommends that teachers conduct an elaborate “Observation Survey.” As opposed to standardized tests, the Observation Survey involves “only slight emphasis on scores and quantifying process,” Clay claims. This is an underestimation of what Clay actually later recommends, as is demonstrated below.

The Observation Survey requires that the teacher fulfill six tasks. First, teachers must keep a “running record” for each child. This means “recording [in writing] everything that a child says or does as he tries to read [aloud] the book” given him for this purpose. For example, the teacher must calculate the child’s oral reading error rate, accuracy of word recognition, and self-correction ratio (number of errors divided by self-corrections), and use data conversion tables. Furthermore, the teacher must use the data so collected to make generalizations about the reading “strategies” the child uses, and recommendations for instruction, all of which are recorded on record sheets. Clay devotes an entire chapter (22 pages) of An Observation Survey of Early literacy Achievement to the details of this time-consuming and complicated process.

The percent of a child’s oral reading errors (substitutions, additions, omissions, mispronunciations, and no response), minus his self-corrections, are useful for deciding how difficult a book is for a child to read independently, Clay correctly contends. A “hard” book for a child, one that frustrates him, is one on which he makes more than 10 percent of certain oral reading errors, it is true. Clay is mistaken in believing, however, that this informal reading inventory can be authentically administered “even if it is only for a couple of lines of print.”

Unfortunately, Clay refuses to accept the empirically verified fact that instructing beginning readers in a direct and systematic way to apply phonics information has been proved to be the most effective way to reduce oral reading errors. In fact, she misleads teachers by directing them to spend much time puzzling out and recording how the young child uses sentence context cues when reading aloud, and planning ways to encourage the child to continue to do so. As noted, the empirical evidence indicates that beginning readers must be taught not to depend on context cues if they are to learn how to recognize words most accurately.

Second, Clay’s Observation Survey requires that a test be made of each child’s ability to identify upper- and lower-case letters (54 letters). A table of the distribution of scores on this test of 6 to 7-year-old New Zealanders is provided so that “an individual child can be compared with other children.” There is no evidence, however, that children’s ability to recognize upper-case letters relates as closely to their word recognition skill as does pupils’ recognition of lower-case ones. Thus, combining data on both kinds of letter recognition leads to a flawed statistic. As well, the validity of the table of letter recognition scores provided by Clay is of doubtful usefulness since she concedes that New Zealand children “do not score in similar ways” to American children in the Observation Survey.

Third, the Observation Survey involves giving a 24-item test “on what children have learned about the way we print language.” Two books for children written by Clay are required when giving this “Concepts about Print” test. It commands of children, for example, in Item 1: “Show me the front of the book”; in Item 12: “What’s wrong with this page?” (words read aloud out of proper sequence); in Item 24: Show me a capital letter.” The test gives equal scoring weight to each of its items. It thus ignores the likelihood that some of its items are more predictive of a beginning reader’s print knowledge than are others. Neither does Clay offer any evidence as to whether scores on this test are more predictive of reading ability than are letter recognition scores, or other test evidence. The table of distribution of pupils’ scores provided here also is suspect since it represents scores of New Zealand, not American children. In short, in comparison to standardized reading tests, Clay’s “Concepts about Print” test indicates in several ways the relative inferiority of its design and authentication.

Fourth, the Observation Survey administers a “Ready to Read” Word Test. This measures children’s ability to recognize “the most frequently occurring words.” The scores on this test are said by Clay to predict accurately the difficulty level of books a child can read independently, plus to indicate how children should be grouped for reading instruction.

To a limited degree, scores on this test do appear to have utility in these respects. This is because most of the words in the test are spelled highly predictably (e.g., at, big, let, not, will, and, up), and consequently to read them successfully gives some indication of a child’s phonics skills. However, a far better test of how well beginning readers can read independently, and what kind of word recognition instruction they still need, would be a reading test of a wide range of individual words carefully sequenced into a hierarchical order of how predictably they were spelled. This would be a more complete and accurate test of how much phonics information a child has acquired, how many phonics rules he can apply successfully, and how well a child can infer the correct pronunciation of a word after gaining its approximate pronunciation through the application of phonics information. These are factors more related to the ability of young children to read independently than being able to read some relatively unpredictably spelled words such as are, come, here, or Mr., the empirical evidence indicates. As before, the table of distribution of scores on this test provided by Clay has doubtful use since the scores are from New Zealand, not American, children.

Fifth, the Observation Survey requires the teacher to take three samples of children’s story writing. The teacher then rates each sample on a scale of 1-6 as to whether it is a “successful composition.” Considered are evidence of letters, words, sentences, punctuation, concepts or original ideas, and proper directional patterns (writing top to bottom, right to left, etc.). Since these ratings are highly subjective, it is doubtful if this test is a reliable one (since all teachers would not rate a child’s writing the same).

In addition, each child is asked to write all the words he knows during three 10-minute test periods. Points are awarded here for correct spellings, i.e., the writing of the letters of a word in the proper order–either from left to right, or right to left. Here Clay wrongly assumes, however, that a word whose letters appear right to left reinforce a child’s recognition of it, as does a word spelled in the proper order.

The teacher also is required to make personal judgments as to the “strategies that work” for the child whose writing is tested, and of “analogies that are tried.” All this evidence from writing is a necessary part of the Observation Survey, Clay avers, because writing “is a good indicator of a child’s knowledge of letters and of the left-to-right sequencing behavior required to read English.” As well, writing reveals the child’s knowledge of “the details of letter formation and letter order,” and how “hand and eye support and supplement each other,” she goes on.

There is no empirical evidence, however, that the analysis of children’s writing behavior that Clay insists be part of the Observation Survey is the most time-effective, precise, or objective way to determine whether children need remedial tutoring for reading deficiencies. Writing letters and spelling words correctly do reinforce beginning readers’ ability to recognize them. Why, then, does she give test points for right-to-left spelling? Moreover, objective tests of children’s abilities to recognize letters, to decode words using phonics information, and to understand what authors intended to convey provide adequate information about the status of children’s reading development on which to decide if they need remedial teaching. As before, the table of distribution of writing scores Clay provides here may not be dependable for reasons previously given.

The sixth, and final part of Clay’s Observation Survey “asks the child to record [in writing] a dictated sentence.” The objective of this “Hearing and Recording Sounds in Words” test is to determine if the child can hear each separate speech sound in spoken words, and write an acceptable letter or letter cluster for it as evidence that he has heard this sound. Any letter or letter cluster so written is awarded a point if the speech sound in question “is sometimes recorded in that way,” Clay sets forth.

This stipulation obviously poses an enormous challenge for the teacher trying to score this test. I have calculated that on the average there are 13.8 different ways to spell each vowel sound, and 5.2 different ways to spell each consonant sound. Some vowel sounds can be spelled 22 different ways. The difficulty in scoring this test is demonstrated by the fact that Clay herself does not always follow her scoring rule that for a letter used by a child to receive a point it must “sometimes” be used that way to represent a speech sound. For example, Clay indicates that if on this test the child spells vary as vare he would receive 4 points. However, the e at the end of words with this spelling pattern never represents the final sound in very (long e). There also is no evidence given by Clay that the five alternative forms of this test are equal in difficulty. So, alternative forms used for pre-testing and post-testing could give faulty evidence about pupils’ improvement in recognizing speech sounds, and writing letters that “sometimes” represent them. The reliability of this test (ability of all teachers to score it equally accurately) thus is compromised badly. Teachers who know much about the range of ways speech sounds are spelled would score the test significantly differently from teachers relatively unknowledgeable in this respect. The reliability of the test also is jeopardized by the fact Clay urges teachers to make various kinds of personalized prompting “comments” to children taking it.

That teachers after administering the test are required to make personal judgments about children’s “unusual” use of space when writing, of “unusual” placement of letters within words, and of “partially” correct attempts to spell words, also negatively affects the test’s reliability. No criteria are offered as to what “unusual,” etc., precisely mean. Nor is any indication given as to how much weight in the total score of this test these subjective judgments should be awarded. The table of distribution of scores of children on this test given by Clay thus especially are unacceptable as age-norms for how well children should be able to write letter representations of speech sounds. There are available more recommendable ways to test children’s ability to hear speech sounds than this test.

When the six parts of Observation Survey are completed, the teacher then is expected to transfer the results onto summary sheets that are provided. That is, “the teacher brings together what she has observed,” and then adds comments on the “useful” versus “problem strategies” each child uses in reading single words, connected text, and with the application of phonics information.

Now, Clay asserts, the teacher is prepared fully to make “early identification of children at risk in literacy learning,” i.e., to make referrals of children to Reading Recovery. However, nowhere at the end of Clay’s An Observation Survey of Early Literacy Achievement does one find presented precise indications of how poorly the “low” achieving, “slow progress” child must perform to meet the various qualitative and quantitative criteria for entry into RR. That is, nowhere is the teacher shown here how to lay out the mass of subjective judgments and numerical data that have been made or gathered, how to weigh each aspect of these data according to its predetermined respective importance, and then how to reach an exact summative decision at this point as to which children need RR, and which do not.

One also looks in vain in the opening chapters of Clay’s Reading Recovery: A Guidebook for Teachers in Training for such instruction or counsel in determining which children should be given priority as RR students. Here the teacher receives little more than vague advice to be “looking for movement in appropriate directions” in students as a sign they do not need RR, or to try “to understand the strategies the child is using” when reading. By implication, it appears that a student may be assigned to RR in a more or less makeshift, irregular, or arbitrary fashion without being methodically evaluated as meeting a carefully defined and explicit set of admittance standards.

Reading Recovery Teaching Procedures

Official Reading Recovery teaching procedures are described in Clay’s Reading Recovery: A Guidebook for Teachers in Training. It is clear that the principles of RR instruction are based on Clay’s version of the reading process. It is not surprising, therefore, that she warns prospective RR tutors against becoming “involved in teaching for detail” of print (i.e., letters, letter-speech sound correspondences, individual words, etc.). Only “from time to time” does the disabled reader in RR supposedly need “to pay attention to the detail of print.” Beginning readers’ knowledge of such detail thus “is of very limited value,” Clay avers. It therefore must be kept “always in a subsidiary status to message getting” in RR sessions, since the child’s rate of progress in learning to read is “seriously threatened” by instruction of this detail. Consequently, the “main focus” of RR “is reading books and writing stories,” Clay emphasizes.

This view of reading instruction is not corroborated by experimental research in reading development, however. To the contrary, the empirical evidence stresses the need for anyone learning to read to pay close attention to all the details of print. Beginning readers’ main problems thus generally are not an inability to understand the “message” involved in ordinary oral language situations, or when listening to stories read aloud to them. Instead, they need to learn how to recognize individual written words, ones that they can understand when spoken aloud to them. Clay also makes clear her antagonism to “programs and teaching sequences of any standard kind.” The bona fide RR tutor thus must have the extremely high qualifications of being familiar with all the teaching techniques, sequences, and activities that have been promoted so far, and then to “pick and choose” among them deciding on effective instructional arrangements, different for each child that is tutored. There is no indication that this requirement is a reasonable nor an attainable one, however. Or for that matter, that it is more than shallow rhetoric on Clay’s part. That is, to graduate as a RR tutor one does not have to pass a test on one’s knowledge of the wide range of propositions that have been made about reading instruction, and how to use them selectively with individual children.

Moreover, despite her supposed disfavor with standard teaching procedures, Clay proceeds forthrightly to name four orthodox instructional procedures for RR that its tutors must use. They are by no means discrete, as it turns out, since their details overlap considerably–and unpredictably. It is mandatory that RR tutors teach children (1) the “directional rules of print”; (2) story writing skills; and (3) reading “strategies,” such as self-monitoring, self-correction, and use of context cues. The RR tutor also (4) must direct the child to read and re-read books. The established sequence of activities in a typical RR session follows this order, Clay explains: the child (1) reads (always aloud, it appears) a “familiar” book; (2) identifies individual letters, or uses letters to make words; (3) writes or dictates a story; (4) rearranges the words of a cut-up story; and (5) reads (always aloud, it appears) a “new” book.

Except for instruction in use of context cues, this teaching does not violate what experimental research says about reading development. Generally speaking, these aspects of instruction have some positive influence, of varying degrees, on children’s acquisition of word recognition skills–which actually is the fundamental goal of beginning reading teaching. As noted previously, direct and systematic teaching of a pre-arranged hierarchy of phonics information, sequenced into the order of difficulty that children have in learning it, is the best way devised to develop beginning readers’ word recognition skills. The merit of RR teaching therefore must be judged as to how closely it conforms to this empirical imperative.

In this respect, Clay assumes that children entering RR are able to read “several” books “at about 90 percent accuracy or better.” This appears to be an overly optimistic conclusion. The dependability of Clay’s judgment here becomes even more suspect, since her views about the relative readability of written materials is faulty. Clay assumes that the least difficult material for a child in RR to read is “a simple text [or “story”] he has dictated.” Growing steadily more difficult for this child, she goes on, are “a very simple story” that has been read aloud to him; “a simple book about the child’s own experiences”; and “an easy book.” The relative readability levels of these materials can be expressed better, however, by reversing the order into which Clay places them. That is, an “easy book,” such as one written by Dr. Seuss, that utilizes a limited number of predictably spelled words, doubtless would be the least demanding task for the child in RR, who typically has limited word recognition skills. A story this child dictates to the RR tutor, which utilizes the full range of the student’s spoken vocabulary and syntactic structures, would be the most difficult for him to read.

Also, Clay unfortunately devotes as much space commenting on ways to establish children’s mastery of the “directional rules of print” (a relatively easily accomplished goal) as she does on advising teachers how to develop pupils’ phonics knowledge, and ways to apply it to decode written words. It is clear, as well, that Clay wrongly assumes that building children’s conscious awareness of speech sounds, and their ability to name letters, largely suffices for these pupils’ attainment of phonics skills.

Where Clay directly discusses “linking sound sequences with letter sequences” (i.e., teaching phonics information), she advances the idea that teaching children correspondences between isolated speech sounds and letters, and how to blend isolated sounds to pronounce words, should be undertaken only as a last resort. The time in the RR session spent on instruction of phonics information and its application (the amount never is disclosed precisely) instead should be given to having children listen and look at whole words, it is recommended by Clay. In this regard, children “play with rhymes,” notice differences at the beginnings (onsets) of whole words (e.g., went-sent), notice differences in less predictably spelled whole words (e.g., hear-bear), cut-up and reassemble words from stories, clap the number of syllables in a word, and guess at the identity of words after sounding out their initial letters.

Clay makes some offhand suggestions that the RR tutor “may note,” when children are writing stories, that some of them may be deficient in their ability to apply phonics information to the spelling of words. No suggestions are offered the RR tutor at this point as to how to correct these pupil inadequacies, however. Also when reading books, children incidentally should practice taking some (indeterminate number and type of) words apart, Clay emphasizes. The advice from Clay for developing beginning readers’ word recognition skills is not wholly acceptable, however, says experimental research, for several reasons. For example, she does not call for a clearly designated, adequate amount of time for phonics teaching. She wrongly assumes that the development of young underachieving children’s knowledge of phonics and how to apply it is not the best way to teach them to recognize words accurately, and thus to comprehend what they attempt to read. She does not arrange phonics skills into the hierarchy of difficulty that beginning readers have in learning them. She provides no systematic way to measure how many phonics rules children know and can apply effectively.

When to Discontinue RR Tutoring for a Child

“This decision [to discontinue RR tutoring for a child] must be weighed very carefully,” Clay reminds RR tutors. The exasperating irony of this statement immediately becomes evident, however. Clay announces in the same breath that “there is no fixed set of strategies nor any required levels of text nor any test score that must be attained to warrant discontinuing” a child from RR. As previously noted, the standards for admission into RR appear be haphazard, disordered, subjective, and even capricious. The prerequisites for transferring a child out of RR, as stipulated by Clay, are even less regulated, objective, or methodical. The only statements even semi-noteworthy by Clay, in her very brief (2-page) discussion of “when to discontinue tutoring,” are that “usually the child ready for discontinuing can read a text which the average child in his second year at school can read. He can write a couple of sentences for his story.” No indications are given by Clay as to what “read” precisely means in this statement, nor what kind of sentences would be acceptable. Otherwise, all that Clay offers in this regard are vague directions to RR tutors to look for “marked improvement” in a child, or to decide if he has gained “a strategy of getting from sounds to letters.”

Clay’s “Research” Is Flawed

In the Reading Recovery Reports, Clay offers some information about seven pieces of “research” on Reading Recovery’s effectiveness that she conducted. The findings of these studies are not very useful. For one thing, Clay admits that her research on RR did not “ask how well this program worked compared with competing programs” that tutor beginning readers. Furthermore, as other analysts of RR research data (see References) have reported, the studies that Clay did on RR have design flaws and statistical irregularities that render their findings less than acceptable. These faults doubtless are the result of Clay’s opposition to research “which looks for explanations of what causes what, or what conditions bring about differences.”


Center, Y. et al. (1992). Evaluating the effectiveness of Reading Recovery: A critique. Educational Psychology! 12, 263-273.

Center, Y., et al. (1995). An experimental evaluation of Reading Recovery. Reading Research Quarterly, 30, 240-263.

Glynn, T. et al. (1989). Reading Recovery in context. Wellington, New Zealand: NZ Department of Education.

Groff, P. (1994). Reading Recovery: Educationally sound and cost-effective? Effective School Practices, 13 (1) 65-69.

Iverson, J. A. & Tunmer, W. E. (1993). Phonological processing skills and the Reading Recovery program. Journal of Educational Psychology, 85, 112-125.

Hiebert, E. H. (1994). Reading Recovery in the United States: What difference does it make to an age cohort? Educational Researcher, 23 (9), 15-25.

Rosinski, T. V. (1995). Commentary on the effects of Reading Recovery. Reading Research Quarterly, 30, 264-270.

Shanahan, T. & Barr, R. (1996). Reading Recovery: An independent evaluation of the effects of an early instructional intervention for at risk learners. Reading Research Quarterly, in press.