Teacher ratings can’t tell good teachers from bad ones – back to the drawing board?

March 4, 2012

Corporate and business people who have lived through serious quality improvement programs, especially those based on hard statistical analysis of procedures and products in a manufacturing plant, know the great truths drilled by such high-quality statistical gurus as W. Edwards DemingThe fault, dear Brutus, is not in the teacher, but in the processes generally beyond the teacher’s control.

Here’s the shortest video I could find on Deming’s 14 Points for Management — see especially point #14, about eliminating annual “performance reviews,” because as Dr. Deming frequently demonstrated, the problems that prevent outstanding success are problems of the system, and are beyond the control of the frontline employees (teachers, in this case).  I offer this here only for the record, since it’s a rather dull presentation.  I find, however, especially among education administrators, that these well-established methods for creating champion performance in an organization are foreign to most Americans.  Santayana’s Ghost is constatly amazed at what we refuse to learn.

Wise words from the saviors of business did not give even a moment’s pause to those who think that we can improve education if we could only get out those conniving, bad teachers, who block our children’s learning.  Since the early Bush administration and the passage of the nefarious, so-called No Child Left Behind Act, politicians pushed for new measures to catch teachers “failing,” and so to thin the ranks of teachers.  Bill Gates, the great philanthropist, put millions of dollars in to projects in Washington, D.C., Dallas, and other districts, to come up with a way to statistically measure who are the good teachers, the ones who “add value” to a kid’s education year over year.

It was a massive experiment, running in fits and spurts for more than a decade. We have the details from two of America’s most vaunted and haunted school districts, Washington, D.C., and New York City, plus Los Angeles and other sites, in projects funded by Bill Gates and others, and we can pass judgment on the value of the idea of identifying the bad apple teachers to get rid of them to improve education.

As an experiment, It failed.  After measuring teachers eight ways from Sunday for more than a decade, W. Edwards Deming was proved correct:  Management cannot identify the bad actors from the good ones.

Most of the time the bad teachers this year were good teachers last year, and vice versa, according to the measures used.

Firing the bad ones from this years only means next year’s good teachers are gone from the scene.

Data have been published in a few places, generally over complaints of teachers who don’t want to get labeled as “failures” when they know better.  Curiously, some of the promoters of the scheme also came out against publication.

A statistician could tell why.  When graphed, the points of data do not reveal good teachers who constantly add value to their students year after year, nor do the data put the limelight on bad teachers who fail to achieve goals year after year.  Instead, they reveal that what we think is a good teacher this year on the basis of test scores, may well have been a bad teacher on the same measures last year.  Worse, many of the “bad teachers” from previous had scores that rocketed up.  But the data don’t show any great consistency beyond chance.

So the post over at the blog of G. F. Brandenburg really caught my eye.  His calculations, graphed, show that these performance evaluations systems themselves do not perform as expected:  Here it is, “Now I understand why Bill Gates didn’t want the value-added data made public“:

It all makes sense now.

At first I was a bit surprised that Bill Gates and Michelle Rhee were opposed to publicizing the value-added data from New York City, Los Angeles, and other cities.

Could they be experiencing twinges of a bad conscience?

No way.

That’s not it. Nor do these educational Deformers think that value-added mysticism is nonsense. They think it’s wonderful and that teachers’ ability to retain their jobs and earn bonuses or warnings should largely depend on it.

The problem, for them, is that they don’t want the public to see for themselves that it’s a complete and utter crock. Nor to see the little man behind the curtain.

I present evidence of the fallacy of depending on “value-added” measurements in yet another graph — this time using what NYCPS says is the actual value-added scores of all of the many thousands of elementary school teachers for whom they have such value-added scores in the school years that ended in 2006 and in 2007.

I was afraid that by using the percentile ranks as I did in my previous post, I might have exaggerated or distorted how bad “value added” really was.

No worries, mate – it’s even more embarrassing for the educational deformers this way.

In any introductory statistics course, you learn that a graph like the one below is a textbook case of “no correlation”. I had Excel draw a line of best fit anyway, and calculate an r-squared correlation coefficient. Its value? 0.057 — once again, just about as close to zero correlation as you are ever going to find in the real world.

In plain English, what that means is that there is essentially no such thing as a teacher who is consistently wonderful (or awful) on this extremely complicated measurement scheme. How teacher X does one year in “value-added” in no way allows anybody to predict how teacher X will do the next year. They could do much worse, they could do much better, they could do about the same.

Even I find this to be an amazing revelation. What about you?

And to think that I’m not making any of this up. (unlike Michelle Rhee, who loves to invent statistics and “facts”.)

You should also see his earlier posts, “Gary Rubenstein is right, no correlation on value-added scores in New York city,” and “Gary Rubenstein demonstrates that the NYC ‘value-added’ measurements are insane.”

In summary, many of our largest school systems have spent millions of dollars for a tool to help them find the “bad teachers” to fire, and the tools not only do not work, but may lead to the firing of good teachers, cutting off the legs of the campaign to get better education.

It’s a scandal, really, or an unrolling series of scandals.  Just try to find someone reporting it that way.  Is anyone?

More, Resources:


GFBrandenburg shows “value-added” teacher measures cannot work

March 4, 2012

You wanted evidence that Michelle Rhee’s plans in Washington, D.C., were not coming to fruition, that the entire scheme was just one more exercise in “the daily flogging of teachers will continue until morale improves?”

G. F. Brandenburg ran the numbers. It isn’t pretty.

Click the top line, which should be highlighted in your browser, to see the original post; or click here.

See more, next post.

GFBrandenburg's Blog

It all makes sense now.

At first I was a bit surprised that Bill Gates and Michelle Rhee were opposed to publicizing the value-added data from New York City and other cities.

Could they be experiencing twinges of a bad conscience?

No way.

That’s not it. Nor do these educational Deformers think that value-added mysticism is nonsense. They think it’s wonderful and that teachers’ ability to retain their jobs and earn bonuses or warnings should largely depend on it.

The problem, for them, is that they don’t want the public to see for themselves that it’s a complete and utter crock. Nor to see the little man behind the curtain.

I present evidence of the fallacy of depending on “value-added” measurements in yet another graph — this time using what NYCPS says is the actual value-added scores of all of the many thousands of elementary school teachers for whom they have…

View original post 255 more words


%d bloggers like this: