Learn About: 21st Century | Charter Schools | Homework
Home / Edifier

The EDifier

December 7, 2016

PISA scores remain stagnant for U.S. students

The results of the latest PISA or the Program for International Student Assessment are in and as usual, we have an interpretation of the highlights for you.

If you recall, PISA is designed to assess not just students’ academic knowledge but their application of that knowledge and is administered to 15-year-olds across the globe every three years by the U.S. Department of Education’s National Center for Education Statistics (NCES) in coordination with the Paris-based Organisation for Economic Cooperation and Development (OECD). Each iteration of the PISA has a different focus and the 2015 version honed in on science, though it also tested math and reading proficiency among the roughly half-million teens who participated in this round. So, how did American students stack up?

In short, our performance was average in reading and science and below average in math, compared to the 35 other OECD member countries.  Specifically, the U.S. ranked 19th in science, 20th in reading and 31st in math. But PISA was administered in countries beyond OECD members and among that total group of 70 countries and education systems (some regions of China are assessed as separate systems), U.S. teens ranked 25th in science, 22nd in reading, and 40th in math.  Since 2012, scores were basically the same in science and reading, but dropped 11 points in math.

PISA Science

Before you get too upset over our less-than-stellar performance, though, there are a few things to take into account.  First, scores overall have fluctuated in all three subjects.  Some of the top performers such as South Korea and Finland have seen 20-30 point drops in math test scores from 2003 to 2015 at the same time that the U.S. saw a 13 point drop.  Are half of the countries really declining in performance, or could it be a change in the test, or a change in how the test corresponds with what and how material is taught in schools?

Second, the U.S. has seen a large set of reforms over the last several years, which have disrupted the education system.  Like many systems, a disruption may cause a temporary drop in performance, but eventually stabilize.  Many teachers are still adjusting to teaching the Common Core Standards and/or Next Generation Science Standards; the 2008 recession caused shocks in funding levels that we’re still recovering from; many school systems received waivers from No Child Left Behind which substantially change state- and school-level policies.  And, in case you want to blame Common Core for lower math scores, keep in mind that not all test-takers live in states that have adopted the Common Core, and even if they do, some have only learned under the new standards for a year or two.  Andreas Schleicher, who oversees the PISA test for the OECD, predicts that the Common Core Standards will eventually yield positive results for the U.S., but that we must be patient.


Student scores are correlated to some degree with student poverty and the concentration of poverty in some schools.  Students from disadvantaged backgrounds are 2.5 times more likely to perform poorly than advantaged students.  Schools with fewer than 25 percent of students who are eligible for free or reduced price lunch (about half of all students nationwide are eligible) would be 2nd in science, 1st in reading, and 11th in math out of all 70 countries.  At the other end of the spectrum, schools with at least 75 percent of students who are eligible for free or reduced price lunch, 44th in science, 42nd in reading, and 47th in math.  Compared only to OECD countries, high-poverty schools would only beat four countries in science, four countries in reading, and five in math.

Score differences for different races in the U.S. show similar disparities.

How individual student groups would rank compared to the 70 education systems tested:

Science Reading Math
White 5th 4th 20th
Black 49th 44th 51st
Hispanic 40th 37th 44th
Asian 8th 2nd 20th
Mixed Race 19th 20th 38th



Despite the disparities in opportunity for low-income students, the number of low-income students who performed better than expected increased by 12 percentage points since 2006, to 32 percent.  The amount of variation attributable to poverty decreased from 17 percent in 2006 to 11 percent in 2015, meaning that poverty became less of a determining factor in how a student performed.


America is one of the largest spenders on education, as we should be, given our high per capita income.  Many have bemoaned that we should be outscoring other nations based on our higher spending levels, but the reality is that high levels of childhood poverty and inequitable spending often counteract the amount of money put into the system.  For more info on this, see our previous blogpost.

November 17, 2016

What does “evidence-based” mean?

The Every Student Succeeds Act requires schools to use “evidence-based interventions” to improve schools.  The law also includes definitions of what evidence means, and recent guidance from the Department of Education has provided additional clarification on what passes as “evidence-based.”  Mathematica has also put out a brief guide on different types of data that have similar categories as the Department of Education, but also provide explanations for data we may see in the media or from academic researchers that do not qualify as hard data but can still help us understand policies and programs.

ESSA Evidence

What follows is a brief summary of what qualifies as “evidence-based” starting with the strongest first:

Experimental Studies:  These are purposefully created experiments, similar to medical trials, that randomly assign students to treatment or control groups, and then determine the difference in achievement after the treatment period.  Researchers also check to make sure that the two groups are similar in demographics.  This is considered to be causal evidence because there is little reason to believe the two similar groups would have had different outcomes except for the effect of the treatment.  Studies must involve at least 350 students, or 14 classrooms (assuming 25 students per class) and include multiple sites.

Quasi-experimental Studies:  These still have some form of comparison group, which may be between students, schools, or districts that have similar demographic characteristics.  However, even groups that seem similar on paper may still have systematic differences, which makes evidence from quasi-experimental studies slightly less reliable than randomized studies.  Evidence from these studies are often (but not always) considered to be causal, though experiment design and fidelity can greatly affect how reliable these conclusions are across other student groups.  Studies must involve at least 350 students, or 14 classrooms (assuming 25 students per class) and include multiple sites.

Correlational Studies: Studies that result in correlational effects can’t necessarily prove that a specific intervention caused students in a particular program to have a positive/negative effect.  For example, if Middle School X requires all teachers to participate in Professional Learning Communities (PLCs), and they end up with greater student improvement than Middle School Y, we can say that their improved performance was correlated with PLC participation.  However, there could have also been other changes at the school that truly caused the improvement, such as greater parental participation, so we cannot say that the improvement was caused by PLCs, but that further study should be done to see if there is a causal relationship.  Researchers still have to control for demographic factors; in this example, Middle School X and Middle School Y would have to be similar in both their teacher and student groups.

With all studies, we also have to consider who was involved and how the program was implemented.  A good example of this is the class-size experiment performed in Tennessee in the 1980s.  While their randomized control trial found positive effects of reducing class size by an average of seven students per class, when California reduced class sizes in the 1990s they didn’t see as strong of effects.  Part of this was implementation – reducing class sizes means hiring more teachers, and many inexperienced, uncertified teachers had to be placed in classrooms to fill the gap, which could have reduced the positive effect of smaller classes.  Also, students in California may be different than students in Tennessee; while this seems less likely for something like class size, it could be true for more specific programs or interventions.

An additional consideration when looking at evidence is not only statistical significance (whether or not we can be certain that the effect of a program wasn’t actually zero, using probability), but the effect size.  If an intervention has an effect size of 0.01 standard deviations* (or other units), it may only translate to the average student score changing a fraction of a percentage point.  We also have to consider if that effect is really meaningful, and if it’s worth our time, money, and effort to implement, or if we should look for a different intervention with greater effects.  Some researchers would say that an effect size of 0.2 standard deviations is the gold standard for really making meaningful changes for students.  However, I would also argue that it depends on the cost, both of time and money, of the program.  If making a small schedule tweak could garner 0.05 standard deviations of positive effect, and cost virtually nothing, then we should do it.  In conjunction with other effective programs, we can truly move the needle for student achievement.

School administrators should also consider the variation in test scores.  While most experimental studies report on the mean effect size, it is also important to consider how high- and low-performing students fared in the study.

Evidence is important and should guide policy decisions.  However, we have to keep in mind its limitations and be cautious consumers of data to make sure that we’re truly understanding how the study was done to see if its results are valid and can translate to other contexts.


*Standard deviations are standardized units used to help us compare programs, considering that most states and school districts use different tests.  The assumption is that most student achievement scores follow a bell curve, with the average score being at the top of the curve.  In a standard bell curve, a change of one standard deviation for a student at the 50th percentile would bump him/her up to 85th percentile, or down to the 15th percentile, depending on the direction of the change.  A report of the effect size of a program typically indicates how much the mean of the students who participated in the program changed from the previous mean or changed from the group of students who didn’t receive the program.

Filed under: CPE,Data,ESSA — Tags: , — Chandi Wagner @ 3:39 pm

June 6, 2016

Behind every data point is a child

statistics-822231_640At CPE, we are data driven. We encourage educators, school leaders and advocates to be data-driven as well. (Indeed, we have a whole website, Data First, which is dedicated to just that. If you haven’t seen it, it’s worth your time to check out.) So while we think an over-abundance of data is a good problem to have, we often remind ourselves and others to take a step back before acting on it, and consider that every data point represents a living, breathing, complex, does-not-fit-the-mold child.

Clearly, good data can lead you to solutions for improving policy and practice in the aggregate. It can also provide insights into particular classrooms or even students. But ultimately what an individual child needs is going to be, well, quirky. We may well find out that Joey struggled with fractions this quarter even though he did well in math the quarter before. If we keep digging, we might also discover that he was absent eight days. But the data won’t tell us why. We won’t even know if the inference that Joey’s fraction trouble was due to his multiple absences is the right one. There could be a million things going on with Joey that only he and his parents can help us understand. But we need to find out before we can effectively intervene.

NPR recently ran a story on Five Doubts About Data-Driven Schools that highlights some of the risks with an absolutist approach to data. I will just address two in this space, but encourage you to read the article itself. It’s short.

One: some critics believe a hyperfocus on data can suppress rather than spark motivation to do better, particularly for low-scoring students. Publishing data that points out differences by individuals or groups can lead to what psychologists call a “stereotype threat.” According to the article, “[M]erely being reminded of one’s group identity, or that a certain test has shown differences in performance between, say, women and men, can be enough to depress outcomes on that test for the affected group.”

I have had my own qualms about the practice in some schools of displaying student test scores, whether of individual students in the classroom or reported by teacher in the school building. There can be great value in having students examine their own data, and helping them use it to take greater charge of their own learning. But there’s also a fine line between encouraging constructive self-examination and reinforcing a potentially destructive perception of failure. Before instituting such a policy or practice, principals and district leaders should think very carefully about the messages being sent versus the messages students, parents and teachers actually hear.

Two: Just because we can collect the data, should it be part of a student’s permanent record? Most would agree that universities and potential employers should have access to student transcripts, grades, test scores and other academic information when making admissions or employment decisions. But, as the article points out, we are entering an era when psychometricians will be able to measure such characteristics as grit, perseverance, teamwork, leadership and others.  How confident should we be in this data? And even if it is reliable, should we even consider such data for traits exhibited in childhood and adolescence that are arguably mutable, and therefore may no longer be accurate descriptions of the individual? I have similar concerns about a child’s disciplinary record following him or her into adulthood.

Over and over again, the availability and effective use of education data has been shown to have a tremendous impact on improving performance at the system, school and individual level. Back to Joey and fractions. Had she not looked at his data, Joey’s teacher would not have identified his struggle, and it might have remained hidden only to become worse over time. This way she is able to dig more, ask questions, find out what Joey needs, and ideally, provide extra help so he will succeed.

But we also need to guard against the overuse of data, lest we allow it to reduce all of a student’s intellect, growth, production, and character to a number and lose a picture of the child.

Filed under: Accountability,CPE,Data — Tags: , , — Patte Barth @ 1:39 pm

December 15, 2015

It’s Official: HS Grad Rates Hit another All-Time High

I feel like am beginning to sound like a broken record as I seem to keep repeating “HS Grad Rates Hit another All-Time High”. Once again this is true as the U.S. Department of Education made it official today that the on-time high school graduation rate for the class of 2013-14 reached 82 percent.

This news does not come as much of a surprise since preliminary results back in October showed most states increased their graduation rates, but it is still worth celebrating. After decades of data showing graduation rates stuck around the 70 percent mark rates have increased significantly in just the last decade alone.

Keep in mind, however, the 82 percent actually understates how many students earn a high school diploma. That’s because the 82 percent is simply the on-time rate, meaning, only those students who entered 9th grade and graduated four years later are counted as graduates. But as our Better Late Than Never report showed, including those students who needed more than four years to earn a standard diploma or better would likely increase the graduation rate to around 87 percent — just a few percentage points shy of the 90 percent mark and a goal that seemed unattainable just a decade ago.


Unfortunately, not all states currently report data that includes late graduates so it is not possible to get a true national graduation rate. But the late grads are students who should be recognized for meeting the same requirements as their classmates who graduated on-time. And schools and districts should be recognized as well for identifying these students who fell behind their classmates and providing the support to them and their teachers to get them back on-track to earn a high school diploma. As our report showed, earning a high school diploma, even if it takes more than four years, significantly improves the chances a student will find success after high school. And both students and schools should be encouraged and rewarded for graduating all students who earn a high school diploma, not just those who did so within four years—Jim Hull

Filed under: Data,Graduation rates,High school,Public education — Jim Hull @ 1:47 pm

September 16, 2015

Budgets, data and honest conversation

Balancing school budgets in a time of shortfalls is a thankless job. Whatever gets cut will nonetheless have its champions, many of whom are willing to let their unhappiness known. Really loud. But one of the nation’s largest school districts is meeting this challenge with a new app that gives the community a channel for telling school leaders exactly what expenditures they want preserved. The hitch – users keep their preferred items only by eliminating others.  In this way, the app delivers an object lesson in how really tough these decisions are.

Fairfax County school district in Virginia serves nearly 190,000 students with an annual budget of $2.6 billion. Despite the community’s affluence, enrollments are rising faster than revenues, and the district is facing a $50-100 million deficit. An earlier citizen task force was charged with recommending ways to close this gap. After reviewing the data, the task force suggested, among other things, eliminating high school sports and band. To say the proposal was not well received is to state the obvious. And the public howls and teeth-gnashing have yet to subside.

So what’s a broke district to do? Give the data to the community. Fairfax released this web-based budget tool to the public this week as a means to call the question: In order to keep [your priority here], what do we get rid of? Users are able to choose from more than 80 budget items to cut in seven categories: “school staffing and schedules,” “instructional programs,” “nonacademic programs,” “instructional support,” “other support,” “employee compensation” and “new or increased fees.”  Each item has a dollar figure attached and the goal is to reduce the budget by $50 million.

I happen to be a Fairfax resident so I was happy to test-drive this web tool. The first thing that struck me was the near absence of low-hanging fruit. All of the big ticket items hurt, mostly because the savings come from reduction in staff or valuable instruction time. Increase elementary class size by one student: $12.9 million. Reduce daily course offerings in high school from seven to six: $25 million. Reduce kindergarten from full-day to half-day: $39 million. Yikes! Given these choices, I could see why eliminating high school sports at nearly $9 million could start to look like a lesser evil.

On the other hand, items that seemed to do the least damage to the educational mission also saved a relative pittance. Raise student parking fees by $50: $300,000.  Reduce district cable TV offerings: $100,000. Increase community use fees: $70,000. Clearly, the nickel-and-dime strategy was not going to get me close to $50 million.

In the end, I came within the 10 percent margin of hitting the target (while keeping high school sports) and I submitted my preferences. But I’ll be honest. They include some choices that I do not feel the least bit happy about. And that’s the point. In 2010, CPE published a report on the impact of the recession on school budgets across the country. The title, Cutting to the Bone, pretty much tells the story. The current Fairfax deficit represents only 2 percent of its yearly budget. But after years of cost-cutting, there’s no fat left to trim.

Clearly, if I were a school board member, I would want to know more about the impact of these programs and policies before making any final decisions. But presenting the data on their cost and what the dollars buy – as this tool does — is a really good way to educate the community about the challenge and engage them in an honest conversation about how they can best serve their students, especially when revenues run short. — Patte Barth

Filed under: Data,funding,Public education — Tags: , , — Patte Barth @ 10:11 am

« Newer PostsOlder Posts »
RSS Feed