The interpretation gap in policy research: An example from an after-school program study

By markd On November 2, 2011 · Leave a Comment

It’s common for a study to be released and its findings to be highlighted in the press, either as press releases or in newspaper or blog articles. But what the study finds is not always what is highlighted. And sometimes what is highlighted isn’t even what the study found.

A case in point here is the recent release of a study of the Higher Achievement program. Well, actually there is an executive summary of about 11 pages on the Public/Private Ventures web page, and a longer 60-page technical report here on the MIT Poverty Action Lab page. The study is following students for a couple more years. These are results after two years.

After-school programs are interesting and Mathematica’s series of reports for the Institute of Education Sciences 10 year ago about the federal after-school program funded under No Child Left Behind called “21st Century Community Learning Centers” left a lot of questions in their wake about whether the programs really were effective (disclosure: I was the study director). The Mathematica studies were efforts to study outcomes of the programs using rigorous methods (experiments and quasi-experiments). Effects on academic learning and youth development were not evident, except for an oddity: children served by after-school programs were more likely to report issues with discipline and behavior. But students using the programs did not participate in them for many days or for many hours in a day. The lack of participation may have mattered.

Clearly a lack of participation was not an issue for the Higher Achievement program. During the course of the study’s two-year period, students (who live in the DC area) averaged about 9 hours of participation a week for the program’s 25-week duration, and also attended 8-hour days for five days a week during a six-week summer program. That’s a lot of hours. Students receive instruction in reading and in math during these times, along with other enrichment activities. They also are selected into the program based on how academically motivated they (and their parents) are.

And the outcome of this intensive effort? The technical report indicates that scores at the end of two years were larger for the program participants by 9 percent of a standard deviation in reading comprehension and 12 percent of a standard deviation in math. Is that a large gain for more than a thousand hours in the program? For reading, it’s like moving a student from the 50th percentile on a test to the 54th percentile. Not huge, but it’s something.

And students were more likely to report behavior problems, just like the 21st Century study found. For example, the probability that a student in the program stole from other students was 19 percentage points higher. That’s a large effect, about four times larger than the program’s effect on test scores, if it’s converted to effect-size units to make it more comparable to the effect on scores.

So what does the release make of all these findings? The release states that “The study finds that Higher Achievement’s program significantly increases students’ math and reading scores.” Hmm, this raises the question of what “significantly” means. The findings were statistically significant, but that’s always going to be the case with large enough sample sizes. (A companion release on the WT Grant web site notes that scores were “measurably higher,” which is accurate to a fault.) As I mentioned above, scores are a bit higher, but for hundreds of hours of instruction, it’s reasonable to ask why scores were not much higher. Conventional math instruction in middle schools, for example, is about 200 hours a year (five hours a week for forty weeks); the program has students for 650 hours a year, and could possibly be doubling math instructional time.

None of the releases mention the negative findings on behavior. The study’s executive summary mentions the increase in negative behaviors, and goes on to speculate that “it is unclear whether they reflect true differences in negative behaviors or a difference in how youth perceived our questions, based on program involvement.” The speculative explanation hinted at in the summary is that maybe the same number of thefts is happening in both groups but the program caused its students to more honestly report them. In other words, the explanation for program youth reporting more frequent negative behaviors is that the program has imbued them with a greater sense of honesty. This is a stretch. The study asks students whether they stole things from classmates, and some say they did. The fraction saying they did is larger for program students than for control students. Occam’s razor applies here: the simplest explanation and the one we should first consider is that program students are committing more thefts.

Here’s the rub. The results are what they are. But the highlights in the press releases are not the results. They are not even the highlights of the results. The headline currently is more like “Scores rose significantly!” The true highlights could be stated this way: an intensive effort to select academically motivated students and have them attend many additional hours of reading and math instruction raised scores, but not by much, and induced negative behaviors by a lot. A reader who is not interested in reading the technical report (which is an exemplar of sophisticated and careful analysis) can be forgiven for misinterpreting what was found by reading only the headines.

Ultimately, efforts to use evidence for development of better practices and programs will be more effective if findings are stated directly and clearly, without spin. Facing an interpretation gap like this one, my first piece of advice is: read the report. OK, I know that’s not going to happen much. My second piece of advice is: be careful about believing results that you want to believe. The Nobel-prize winning physicist Richard Feynman said it best. What’s missing in what he called “pseudo-science,” is “a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty — a kind of leaning over backwards…Details that could throw doubt on your interpretation must be given, if you know them.”

It’s this simple principle–“utter honesty”–that is worth keeping in mind when presenting or highlighting findings.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

The interpretation gap in policy research: An example from an after-school program study

Leave a Reply Cancel reply

Categories

Articles

Calendar

Meta

On Research

Pages

The Latest

Strange happenings in Wyoming

More