Home > Uncategorized > Writing Tests Gone Awry

Writing Tests Gone Awry

June 1, 2014

This story is too new to make the Top 20 Countdown that I started earlier today, but in time, it may rise to that level.I have received messages from administrators in two school districts sharing letters that they have sent to Superintendent Janet Barresi and the SDE about the questionable scores received on the 2014 fifth and eighth grade writing tests. Rob Miller has already covered this issue thoroughly on his blog. I won’t rehash all of his talking points, but I’ll get into those too. Suffice it to say that at least three large districts are scratching their heads over the scoring process.

This is from District #1:

We have serious concerns with the state’s application of the writing rubric. It appears that readers looked at a paper and assigned it a number that they input in all the sub scores. When papers were re-scored by local teachers, administrators and literacy experts, the scores among the sub scores varied greatly. Students could produce a paper that had good mechanics, sentences, paragraphs, spelling, punctuation, etc. and lack important aspects about citation and coherence. Others presented good arguments and citations but did so with run on sentences and poor spelling. Needless to say, in our scoring, it was rare for a paper to receive the same sub score across the entire rubric. That being said, we see approximately 80% of 8th grade scores and 60% of 5th grade scores coming back with no variation across the five writing traits in the rubric. The problems we see with these scores make us question the use of the rubric at all.

CTB officials informed district test coordinators at their meeting on May 28, 2014 that the writing tests were scored to determine a percentage of “plagiarism.” This was the first mention of a reduction in scores due to a “certain percentage of plagiarism.” The actual percentage used was not shared with the attendees but was promised to be provided at a later date. We have grave concerns about this aspect of scoring because the students were asked to cite text evidence in their essays. The fifth grade test instructions stated, “Be sure to state your opinion and support it using information presented in both passages.” The eighth grade test instructions stated, “Be sure to state a claim and address an opposing claim using evidence presented in both passages.”

District #2 covered some of the same ground, and then added this:

With these fundamental concerns in mind, we will be requesting that a considerable percentage of our tests be re‐scored. We do not, however, feel that the district should be liable for these costs. The fee of $125 is exorbitant. Scorers paid by CTB receive a low hourly wage and have to keep a relatively high production rate during the time they are under temporary assignment with the testing company. While we understand that some processing costs exist, none of that would explain the $125 fee. By our most conservative estimates, this amounts to a 90% mark-up of CTB’s out-of-pocket expenses. In other words, the fee is in place as a deterrent to keep districts from asking for tests to be re-scored.

Our immediate plan is to continue reviewing our student responses and compiling a list of those that we wish to have re-scored. Our request to you is that we not be charged for the effort. The dedicated teachers of this district are reviewing these responses on their own time. At the very least, CTB could do the same.

The critical points here seem to be:

  • The rubric does not seem to have been used correctly.
  • Most students received the same sub-score for all five writing skills.
  • Students who properly cited a prepared text received deductions.
  • The cost to re-score student responses is ridiculous.

On the first point, Rob took a good look at the rubric.

There are five areas scored on the writing rubric. Both the fifth and eighth grade rubrics for the “transitional CCSS writing test” include the following scored standards. The scoring “weights” for each standard are also listed. I will come back to this in a minute because this is where things start to get fishy.

Ideas and Development—30%
Organization, Unity, and Coherence—25%
Word Choice—15%
Sentences and Paragraphs—15%
Grammar and Usage and Mechanics—15%

Both writing rubrics are on the OSDE website and can be viewed (5th) HERE and (8th) HERE.

Let’s get back to the scoring. Each of the five standards is graded on a scale of 1.0 to 4.0 in 0.5 increments. Again, using the 755 scores that I have at my disposal, let me show you how the scores for the 8th grade test break down at my school. The lowest score possible is a 15 and the highest score is a 60.

At first glance, it appears that the scores are derived by combining the point totals from each standard and multiplying by three. I have bolded those scores where this rule seems to apply. It is also evident that this is not always the case.

Total score:
5 = 15
5.5 = 24
6.5 = 25
7.5 = 29
8.5, 9.0, or 9.5 = 30
10.0 = 32
10.5 = 35
11.0 = 35 or 36 (36 is proficient score)
11.5, 12.0 = 36
12.5 = 38
13.0 = 37 (only one of these)
13.5 = 41 or 42
14.0 = 41 or 42
15 = 45
16 = 47
16.5 = 48
17.5 = 52
18.0 = 54
19.5 = 56
20 = 60

It is obvious from this chart is that the weights discussed above WERE NOT USED, or were used haphazardly. Any score of 12.0 earned a 36 regardless of how the scoring was distributed. Yet in one case a score of 11.0 earned a passing score of 36 with individual standard scores of 3/2/2/2/2 while another 11 (2/3/2/2/2) scored a limited knowledge score of 35.

However, a 10 always earns a 32, a 15 always earns a 45, and so on for most of the scores. The only exceptions were for scores of 11.0 (35 or 36), 13.5 (41 or 42), 14.0 (also 41 or 42).

Also note that the odd fact that a score of 7.5 earns a 29 while a 8.5, 9.0, or 9.5 only earns one more point (30). Suffice it to say, this doesn’t seem to make much sense.

On the second point, it seems pretty absurd that most students would receive the exact same score for each writing trait. It’s possible that it could happen for some, but not for 81% of the responses, as it happened at Jenks Middle School. Just from the things I’ve seen on Facebook and Twitter this weekend, it is not an isolated problem.

Let me say this another way: they’re not just picking on Jenks this time!

My only explanation for this is that the scorers are rushed. They read a response, develop an overall impression, and then assign points – in many cases giving the essay a 2.0 all the way across (which seems to be the most common score).

For one trait in particular, Sentences and Paragraphs, here are the bullet points for a response receiving a score of two:

  • Limited variety of sentence structure, type, and length
  • Several fragments or run-ons
  • Little or no attempt at paragraphing

Teachers looking over the images of their student responses are adamant that these statements are not accurate descriptors of what they are seeing. An essay lacking in ideas and development might very well have appropriate use of Sentences and Paragraphs.

As for the first district’s concerns about plagiarism, apparently every single fifth and eighth grade language arts teacher in the state misunderstood the instructions. Believe it or not, the SDE and/or CTB were unclear about something. I’d be more willing to believe the scorers (who are temporary laborers) had no clue what to do with cited information. Maybe their training prior to scoring is in adequate.

Rob is right. The teachers and administrators up in arms throughout the state are right too. What is wrong, however, is the expectation that school districts generate a purchase order and gamble on having the tests re-scored. At that price, though, why would anybody risk it?

In case you haven’t noticed, the ramifications of bad A-F Report Card grades can be huge. They can force a good school to jump through countless hoops for years – hoops that really don’t foster school improvement. With the crazy change to define Full Academic Year as beginning Oct. 1, the elimination of modified assessments for special education students, and the number of high-achieving students who received exemptions from state tests due to 2013 legislation, many grades and subjects are seeing lower test scores in 2014.

Students deserve accurate scores. So do schools. And they shouldn’t have to pay out their eyeballs for it.

  1. Rob miller
    June 1, 2014 at 7:34 pm

    Excellent post, my friend.

    I did discover an error on my post that you shared in yours. It involved the scores of 11.0. Here is the revision: “Yet in one case a score of 11.0 earned a passing score of 36 with individual standard scores of 3/2/2/2/2 while another 11 (2/3/2/2/2) scored a limited knowledge score of 35.” I went back to the data to check after I realized the numbers did not add up to 11. I am normally pretty good at adding numbers:-)

    I agree with you that it seems like the scorers were simply determining a holistic score and applying it across the board. That totally defeats the purpose of a rubric. The rubric allows the student and teacher to evaluate specific strengths and weaknesses. By ignoring the rubric, these scores are no longer valid.

    There are only two acceptable outcomes. The SDE should demand a full re-scoring of all the writing tests or (2) invalidate all of them and use the scores as a field test only. These scores CANNOT be used in A-F calculations as they are.


  2. Charles Arnall
    June 2, 2014 at 9:19 am

    I have been in education for 37 years and I still don’t understand how these scores are calculated. I love reading information like this. For reasons like this, I am running for political office this year. Senate district 18, which includes all of Wagoner county and parts of Cherokee, Muskogee, Mayes and a small portion of eastern Tulsa county.


  1. June 2, 2014 at 8:45 pm
  2. June 13, 2014 at 7:45 pm
  3. June 17, 2014 at 5:59 pm
Comments are closed.
%d bloggers like this: