Learn About: 21st Century | Charter Schools | Homework
Home / Edifier


The EDifier

January 31, 2013

Beyond the pink slip: Teacher evaluation isn’t just for firings

It’s not ground-shattering to say that the conventional teacher evaluation system is broken.  This isn’t just an argument made by education reformers and parents, many teachers agree with this point as well.  They cite the haphazard, subjective nature of evaluations, which research suggests does little to improve instruction or lead to the removal of subpar teachers.

In response to calls for better systems to evaluate teachers, the Bill & Melinda Gates Foundation has funded the Measures of Effective Teaching project (MET).  This group has undertaken a three year study in seven school districts to analyze the effectiveness of certain measures of excellent teaching, specifically student standardized test scores, student surveys, and observations.  However, beyond the much debated question about whether these systems can really measure effective teaching, I’d like to ask a different, perhaps more important question—what should be the primary purpose of these systems?

National conversations about teacher evaluation have mainly focused on getting bad teachers out of the classroom, and there is no doubt that there are some teachers teaching today that shouldn’t be.  However, the notion that America’s schools are brimming with teachers who show movies and check emails all class period is, simply put, a myth.  This fact is certainly confirmed by the MET study.  The study trained observers and had those observers watch 7,491 videos of instruction by 1,333 teachers from six socio-economically and geographically diverse districts.  The observers were tested on their knowledge of the observation rubrics, and they were retested when observing to ensure the scores they were giving were calibrated.  After all those observations, the study found that “overall observed practice is overwhelmingly in the mid-range of performance as defined by the instrument.”  In other words, there were few outstanding teachers (as defined by the observation instrument) but also few really weak teachers (as defined by the observation instrument).

This finding should give us pause to really think about our goals in rolling out revamped teacher evaluation systems.  The writers of the MET study suggest that one can accurately use a combination of observations, student surveys, and standardized test data to identify exceptionally ineffective teachers, but then what?  Even if we fire the teachers who are exceptionally ineffective, that may only be a very small percentage of teachers in the classroom.  Will that alone drastically improve student achievement?

I think that looking at teacher evaluations as only a method to weed out the weak is exceptionally shortsighted and may represent a huge investment of money that gets America little bang for its buck.  Instead, we need to think about how teacher evaluations can be used to improve the teaching of teachers.  Bill Gates, in a recent op-ed piece written for CNN, acknowledges this fact, arguing that “the vast majority of teachers get zero feedback on how to improve” while they work “in isolation and have been asked to improve with little or no feedback.”  As a former public school teacher, my own experiences absolutely confirm this fact.

However, as districts rush to revamp their evaluation programs to align with the demands of Race to the Top and state policies, it’s questionable whether or not feedback to teachers is really a priority.  The Center for American Progress recently released a study exploring teacher perceptions of an urban district’s new teacher evaluation system.  The district rolled out the evaluation system in part to compete for Race to the Top funds.  At the beginning of the year, teachers set two student learning objective goals with their administrator.  Throughout the school year, teachers were observed by an administrator who ranked them on a scale of 1 (needs improvement) to 5 (exemplary).  Low scores and high scores had to be confirmed by an outside evaluator, a move to ensure fairness and objectivity on the part of the observer, and evidence of student achievement which aligned with the teacher’s initial goals was evaluated at the end of the year to see if students really grew academically.

This evaluation system did result in an increase in firings, but it didn’t result in much feedback to the teachers who weren’t fired about how to improve their practice.  After interviewing a large sample of teachers in the district, most said that the new system had no impact on their pedagogy.  Only half of the teachers said any of the feedback was helpful; some even said they got no feedback.  At the end of the 2010-2012 school year, the district had spent countless sums of money developing an evaluation system, hiring outside evaluators, and implementing the system, but fired only 34 teachers in a district employing 1600 teachers (2% of its teaching force).  What about the other 98% of teachers in the district?  According to those teachers, all of this time and money resulted in little to no change in their teaching.  That’s a problem.

In the public dialogue about how to improve America’s classrooms, there’s often a simplistic notion that firing a teacher and replacing him or her is a no-brainer solution to our educational dilemma.  While new evaluation systems should identify exceptionally ineffective teachers, that alone is not enough.  They have to provide feedback for improvement for teachers who aren’t fired.  Such a focus certainly brings into light new questions about who is giving the feedback, the nature of the feedback, the qualities of good instruction, and how to coach teachers toward good instruction; however, those are the questions we really need to be asking.

Filed under: CPE,Teacher evaluation,teachers — Tags: , , — Allison @ 2:57 pm





October 17, 2011

Others agree, Fordham’s claims about high achievers not supported by data

Last month I wrote about how the Fordham Institute’s claim that our nation’s high achievers are losing ground wasn’t supported by evidence.  Well, it is good to know I am not alone. First, the National Education Policy Center (NEPC) supported my critique that the data didn’t back up Fordham’s claim– but then again NEPC disagrees with just about everything Fordham says. Then the Center for American Progress (CAP)—which agrees with Fordham on several issues–released a critique of the Fordham report that raised similar concerns about the conclusions as I did. CAP’s main criticisms were:

1.     Fordham claimed that the federal No Child Left Behind law might have caused high-flying students to do worse over time. All of Fordham’s data, however, came from the post-NCLB time period. Without a pre-NCLB comparison, there is no way to make a claim that NCLB caused the decline.

2.     The report fails to acknowledge the true consequences of poverty on student achievement. The Fordham researchers note that “high achievers in high-poverty schools grew slightly less than those in low-poverty schools,” but use this finding to argue that poverty is not a strong predictor of student progress. Ample evidence proves, however, that low-income children need more resources in order to overcome the disadvantages they bring with them to school.

3.     A broader look at the data suggests that the nation’s top students have actually been gaining ground in a number of areas. For example, from 2000 to 2009, the percentage of eighth graders scoring at the highest level in math jumped 3 percentage points on the National Assessment of Educational Progress.

I’d have to agree with CAP on each of these points. It will be interesting to hear Fordham’s rebuttable when they host a conference on their report that includes one of the CAP authors on Monday October 17th. I will certainly be watching. – Jim Hull

Filed under: Achievement Gaps,research — Tags: , , , — Jim Hull @ 10:27 am





October 13, 2011

Using growth in NCLB’s reauthorization

On Wednesday, Senator Harkin released his bill to reauthorize the Elementary and Secondary Education Act (ESEA), better known as No Child Left Behind (NCLB). I haven’t read the bill yet, but I have read that Senator Harkin is proposing to drop the current Adequate Yearly Progress (AYP) requirement and instead evaluate schools based on “continuous improvement.” This would mean that all students are no longer expected to be proficient by 2014; instead, they are expected to make a certain amount of academic gains from year to year.

Sounds simple enough. Critics and proponents of NCLB alike have been pushing for the inclusion of a measurement of student growth since NCLB was enacted nearly a decade ago. However, back in 2002 less than a handful of states had the assessments and the data systems in place to measure how much academic gains individual student made from year to year. Now, thanks to NCLB, all states have the capacity to make such calculations. Almost all would agree that including such measures would greatly improve the fairness of any accountability system.  

Yet, incorporating student growth into a federal accountability system is not as straightforward as it seems. First of all, as my report Measuring Student Growth illustrates, there is no single method to measuring student growth. Choosing which method is best depends on the data available and how the data is going to be used. For example, a growth model, which identifies students who are not gaining as much as similar students, will look a lot different than a model that is used to identify students who gained enough in the past year to be on track to reach a certain benchmark such as being college and career ready when they graduate high school.

So before a growth model is used for accountability, policymakers need to state a clear purpose for what the growth data is to evaluate. For example, is the purpose to ensure schools are closing achievement gaps? Is the purpose to ensure all students are college or career ready by the end of high school? Or is it to identify schools where students are making fewer gains than students in schools with similar student populations?  For each of these questions, an adequate answer would require a different growth model.

Second of all, simply moving from a proficiency-based accountability system such as NCLB to a continuous improvement based system as proposed by Senator Harkin overlooks the fact that most state assessments are not designed to effectively measure student growth from grade to grade. Most states have developed their assessments to evaluate if a student is proficient or not proficient each year. As such, many state assessments are unable to reliably determine how much a student has learned from year to year, especially a student who scored at the very high or very low end of the test’s achievement scale. Yes, states can calculate a growth measure using the assessments they now have in place, but in many cases the result will not be as accurate as if the assessments were designed specifically to measure student growth.

These are just two major issues when it comes to including a growth measure for federal accountability. Yes, evaluating schools based on student growth is much fairer than how schools are currently evaluated under NCLB. But just simply including a growth measure does not automatically make it a better accountability system. Policymakers need to set a clear purpose for accountability systems and then incorporate a growth model that would best evaluates whether schools are meeting their goals. From what I know now about the Harkin bill, it is not clear whether the purpose is to ensure all students are college or career ready or if all students are making a year’s worth of gains. Without having a clear purpose for what to hold schools accountable for, adding a growth model will not be any fairer than NCLB. – Jim Hull






October 5, 2011

How does your district compare to Finland?

Last week the Web site Global Report Card (GRC) was launched by the George W. Bush Presidential CenterIt enables the public to compare their school district’s academic performance in math and reading to that of students in 25 developed countries around the world, including top-achieving Finland, Canada, Japan, and Singapore.

Although the Web site is easy to use, actually making such comparisons is not. There are significant limitations in making fair comparisons of districts across states, never mind across countries. However, Jay Greene and Josh McGee, who created the GRC, have postulated that their comparisons of Boston to Finland (for example) are fair and reliable.

You can put me in the skeptical camp on this one. Not only are they comparing results across countries, they are doing it across grade levels as well. For U.S. school districts, they use scores from state assessments from all tested grades, which is grades 3 through 8 and 10th grade in most states. Other countries’ results are based on the international assessments in which they participated, which would at most include 4th, 8th and 10th grade in math and 4th and 10th grade in reading.

Keep in mind different assessments with significantly different purposes and given in different years were used in different grades and subjects. For example, 4th and 8th grade math scores are derived from TIMSS, which is designed specifically to measure how well students have learned what they were expected to be taught in school. Tenth grade reading and math scores come from PISA, which measures how well students can apply their math and reading knowledge to real life problems, no matter if they attained that knowledge in school or not. To add even more complexity to the comparisons, not all 25 countries participated in each of the assessments at each of the grade levels. Hence, districts’ results on their state assessments across multiple grade levels are compared to each country’s results across different assessments that not all comparable countries took part in.

You may remember I was skeptical in a post earlier this year of another report that compared the U.S. to other countries and that comparison was based on one grade level, in one year, on one assessment for each country. And then countries were only compared to U.S. states who had only taken one assessment, in one grade level, in one year. A far more straightforward comparison than the GRC, yet still statistically questionable.

Both report cards, however, attempt to make important comparisons that — if fair and reliable — would provide valuable information on how our students compare to their peers in other countries. Yet, we don’t know how reliable the comparisons actually are, especially at the district level, where smaller districts appear to have a distinct advantage over larger districts with similar demographics.

However, maybe the GRC with all its question marks will lead to accurate international comparisons at both the state and district levels. Because it really is an important question to answer whether our students in our best districts are as prepared as students in the highest performing countries. The answer could have a tremendous impact on the focus of our education reform efforts. – Jim Hull






September 23, 2011

Are our top students being left behind?

It’s déjà vu all over again. Back in 2008 the Fordham Institute claimed in this report that our nation’s best students were being hurt by current education reform efforts, particularly NCLB. Fast forward to earlier this week where Fordham released another report to once again try to show that our education reforms are being targeted at our low performing students at the expense of our top students. The similarities don’t end with both studies examining the performance of high achieving students. In both reports Fordham’s conclusions don’t fit what their own data says.

In the 2008 study Fordham argued our top students were being left behind because their gains were not as large as the gains low performing students made post-NCLB. I argued then that their own data didn’t fit their claim. Once again, Fordham’s claim that our top students are being left behind doesn’t fit their own data. As a matter of fact, according to Fordham’s report the gap in math scores between low- (those scoring below 10th percentile) and high-performing (those score above the 90th percentile) did not significantly change as students moved from 3rd to 8th grade or from 6th to 10th grade. The good news is that all students made consistent gains. Unfortunately for low-performing students, their performance still lagged way behind. The story is a bit different in reading where gaps did close between the lowest and highest performing students. However, Fordham sees this gap closing as a negative even though high performing students continued to make significant gains between the 3rd and 8th grades.

Just as I argued in 2008, this is how gaps should be narrowed, where everyone improves but the lowest performers improve at a faster rate. However, Fordham didn’t agree with me then and I’ll safely assume they won’t agree with me now. We will just have to agree to disagree because I don’t believe the data shows our best students are being short changed simply because our lowest performers are making more progress than our highest performing students.

Now that doesn’t mean our schools or our education policies should focus solely on our lowest performing students. Educators and policymakers need to ensure that all students have an opportunity to reach their highest academic potential before they go onto college or the workplace. Yet, neither Fordham study provides compelling data that our schools are short changing our highest performing students.

Yes, educators and policymakers need to focus on our highest achieving students. International test scores show we have a much smaller proportion of advanced students than the leading countries such as South Korea and Finland. But the same international tests show we also have a much larger proportion of very low performers than most other industrialized nations. And students with such low achievement have little chance to go onto any sort of postsecondary education or find a good job that pays a living wage and offers benefits. So we need to at least sustain the gains our highest achievers are making since many will be our country’s future innovators, policymakers and business leaders. At the same time, we need to accelerate the gains our lowest achieving students are making so they at least have the minimal skills necessary to either go onto earn some sort of postsecondary degree/certificate or find a good job. Doing so is not a zero-sum game. If we provide our teachers with the training, resources, and support they need, they can improve the performance of all students. – Jim Hull

Filed under: Achievement Gaps,Public education,Report Summary — Tags: , , , , — Jim Hull @ 1:35 pm





Older Posts »
RSS Feed