Separate Study Confirms Many LA Times Findings
A University of Colorado review of Los Angeles Unified teacher effectiveness also raises some questions about the precision of ratings as reported in The Times.
By Jason Felch, Los Angeles Times | http://lat.ms/h67pEC
February 7, 2011 - A study to be released Monday confirms the broad conclusions of a Times' analysis of teacher effectiveness in the Los Angeles Unified School District while raising concerns about the precision of the ratings.
Two education researchers at the University of Colorado at Boulder obtained the same seven years of data that The Times used in its analysis of teacher effectiveness, the basis for a series of stories and a database released in August giving rankings of about 6,000 elementary teachers, identified by name. The Times classified teachers into five equal groups, ranging from "least effective" to "most effective."
After re-analyzing the data using a somewhat different method, the Colorado researchers reached a similar general conclusion: Elementary school teachers vary widely in their ability to raise student scores on standardized tests, and that variation can be reliably estimated.
But they also said they found evidence of imprecision in the Times analysis that could lead to the misclassification of some teachers, especially among those whose performance was about average for the district.
The authors largely confirmed The Times' findings for the teachers classified as most and least effective. But the authors also said that slightly more than half of all English teachers they examined could not be reliably distinguished from average. The general approach used by The Times and the Colorado researchers, known as "value added," yields estimates, not precise measures.
The Colorado analysis was based on a somewhat different pool of students and teachers from the Times analysis, a difference that might have affected some of the conclusions. The Colorado researchers began with the same dataset released to The Times, but their ultimate analysis was based on 93,000 fewer student results and 600 fewer teachers than the analysis conducted for The Times by economist Richard Buddin.
In addition, to improve the reliability of the results it reported, The Times excluded from its ratings teachers who taught 60 students or fewer over the study period. The Colorado study excluded only those teachers who taught 30 students or fewer.
After a Times reporter inquired about that difference, Derek Briggs, the lead researcher on the Colorado study, said in an e-mail that he had recalculated his figures using only those teachers who had taught more than 60 students. Doing so reduced the number of discrepancies, he said; but still, up to 9% of math teachers and 12% of English teachers might have ended up in different categories using Colorado's method than they did in The Times' analysis.
The authors also found that the way school administrators assign students to teachers — giving especially challenging students to a certain teacher and not to others, for example — could have skewed the value-added results. But recent research by a Harvard professor using Los Angeles school data did not find that such assignments created a bias in value-added scores.
Buddin said that although most conclusions of the two studies were similar, the differences in data analyzed made it difficult to directly compare his results with those of the Colorado study.
The Colorado study comes as education officials in Los Angeles and across the country are moving to incorporate more objective measures of performance into teacher evaluations. In the process, they are confronting the technical challenges involved in value-added analysis, which attempts to estimate a teacher's effect on student learning by measuring each student's year-to-year progress.
Developing value-added scores requires numerous judgment calls about what variables to use and how to obtain the most reliable results. Each school district that has used value-added follows slightly different methods, and supporters of the approach say it should not be used as the sole measure of a teacher's ability.
Briggs said his goal was to raise awareness about those issues. "You have an obligation to have an open conversation about the strengths and weaknesses" of the methodology, he said.
Briggs' study was partly funded by the Great Lakes Center for Education Research and Practice, which is run by the heads of several Midwestern teachers unions and supported by the National Education Assn., the largest teachers union in the country.
Download Full Research Report
Download Two-page Summary
Research Study Shows L. A. Times Teacher Ratings Are Neither Reliable Nor Valid
New Research Shows Serious Flaws in the Research Behind the L.A. Times’ Controversial Ratings of Individual Teacher Performance
University of Colorado Press Release | http://bit.ly/emuF1t
William Mathis, NEPC
BOULDER, CO (February 8, 2011) – A new study published today by the National Education Policy Center finds that the research on which the Los Angeles Times relied for its teacher effectiveness reporting was demonstrably inadequate to support the published rankings. Due Diligence and the Evaluation of Teachers by Derek Briggs and Ben Domingue of the University of Colorado at Boulder used the same L.A. Unified School District (LAUSD) dataset and replicated the methods of the Times’ researcher but then probed deeper and found the earlier research to have serious weaknesses.
Based on the results of the Briggs and Domingue research, NEPC director Kevin Welner said, “This study makes it clear that the L.A. Times and its research team have done a disservice to the teachers, students, and parents of Los Angeles. The Times owes its community a better accounting for its decision to publish the names and rankings of individual teachers when it knew or should have known that those rankings were based on a questionable analysis. In any case, the Times now owes its community an acknowledgment of the tremendous weakness of the results reported and an apology for the damage its reporting has done.”
In August 2010 the Los Angeles Times published ratings that purported to show the teaching effectiveness of individual Los Angeles teachers. The teachers’ ratings were based on an analysis of their students’ performance on California state standardized reading and math tests.
The researcher hired by the Times, Richard Buddin of the RAND Corporation (who conducted the work as a project independent of RAND itself), also published his work as a “white paper,” which provided the template from which Briggs and Domingue worked. Buddin used a relatively simple value-added model to assess individual teacher performance for the period from 2003 to 2009. He found significant variability in LAUSD teacher quality, as demonstrated by student performance on standardized tests in reading and math, and he concluded that differences between “high-performing” and “low-performing” teachers accounted for differences in student performance.
Yet, as Briggs and Domingue explain, simply finding that a value-added model yields different outcomes for different teachers does not tell us whether those outcomes are measuring what is important (teacher effectiveness) or something else, such as whether students benefit from other learning resources outside of school. Their research explored whether there was evidence of this kind of bias by conducting what researchers call a “sensitivity analysis” to test whether the results from the L.A. Times model were valid and reliable.
First, they investigated whether, when using the L.A. Times model, a student’s teacher in the future would appear to have an effect on a student’s test performance in the past—something that is logically impossible and a sign that the model is flawed. This is analogous to using a value-added model to isolate the effect of an NBA coach on the performance of his players. At first glance we might not be surprised when the model indicates that Phil Jackson is an effective coach. But if the same model could also be used to indicate that Phil Jackson improved Kobe Bryant’s performance when he was in high school, we might wonder whether the model was truly able to separate Jackson’s ability as a coach from his good fortune at being surrounded by extremely talented players.
Briggs and Domingue found strong evidence of these illogical results when using the L.A. Times model, especially for reading outcomes: “Because our sensitivity test did show this sort of backwards prediction, we can conclude that estimates of teacher effectiveness in LAUSD are a biased proxy for teacher quality.”
Next, they developed an alternative, arguably stronger value-added model and compared the results to the L.A. Times model. In addition to the variables used in the Times’ approach, they controlled for (1) a longer history of a student’s test performance, (2) peer influence, and (3) school-level factors. If the L.A. Times model were perfectly accurate, there would be no difference in results between the two models. But this was not the case.
For reading outcomes, their findings included the following:
• More than half (53.6%) of the teachers had a different effectiveness rating under the alternative model.
The math outcomes weren’t quite as troubling, but the findings included the following:
• Only 60.8% of teachers would retain the same effectiveness rating under both models.
Accordingly, the effects estimated for LAUSD teachers can be quite sensitive to choices concerning the underlying statistical model. The choice of one reasonable model would lead to very different conclusions about individual teachers than would the choice of a different reasonable model.
Briggs and Domingue then examined the precision of Buddin’s teacher-effect estimates – whether the approach can be used to reliably distinguish between teachers given different value-added ratings. They find that between 43% and 52% of teachers cannot be distinguished from a teacher of “average” effectiveness, once the specific value-added estimate for each teacher is bounded by a 95% confidence interval. Because the L.A. Times did not use this more conservative approach to distinguish teachers when rating them as effective or ineffective, it is likely that there are a significant number of false positives (teachers rated as effective who are really average), and false negatives (teachers rated as ineffective who are really average) in the L.A. Times’ rating system. Using the Times’ approach of including only teachers with 60 or more students, there was likely a misclassification of 22% (for reading) and 14% (for math).
The new report also finds evidence that conflicts with Buddin’s finding that traditional teacher qualifications have no association with student outcomes. In fact, the researchers found significant and meaningful associations between value-added estimates of teachers’ effectiveness and their experience and educational background.
Yesterday, on Monday February 7, 2011, the Times published a story about this new study. That story included false statements and was generally misleading. Accordingly, along with this study’s release, we are publishing a “Fact Sheet” about the Times’ new article (http://nepc.colorado.edu/files/FactSheet_0.pdf)
Find Due Diligence and the Evaluation of Teachers, by Derek Briggs and Ben Domingue, on the web at:
The mission of the National Education Policy Center is to produce and disseminate high-quality, peer-reviewed research to inform education policy discussions. We are guided by the belief that the democratic governance of public education is strengthened when policies are based on sound evidence. For more information on NEPC, please visit http://nepc.colorado.edu/.
This research brief was made possible in part by the support of the Great Lakes Center for Education Research and Practice (greatlakescenter.org).
●●smf’s 2¢: The LA Times isn’t just an advocate for value-added teacher assessments – it is a practitioner of the craft in a sort of unofficial way: “We’re from the media and we’re here to help!”
The Times and Jason Felch are far more than reporter and newspaper – they are provocateurs.
Going from reporter to newsmaker is dangerous enough – now the actual reporter/newsmaker reports on a critical outside review of his own effort (and it is critical) and gives himself and his employer a “good job” for the effort. :-)
Plus the critical study was funded in part by teacher’s unions… and you (or at least The Times) knows how they are!.