Randomized trials increasingly are being used to assess the effectiveness of new approaches and strategies in education and social policy settings. The method can be a powerful and flexible tool in education settings in which approaches are being designed for whole classrooms or schools. But there are some aspects of cluster trials that can trip up the unwary.
A “cluster” trial is one in which units of more than one individual are randomly assigned to treatment or control groups. In health research, clusters might be doctor’s offices or even whole communities. In education, for example, a study might randomly assign classrooms to use a particular education technology software package. In this case, a “classroom” includes both a teacher and his or her students, and the study might compare test scores in classrooms with and without technology.
Note that in this example, classrooms are being assigned but students are being analyzed. We’ll get back to the implications of that in a minute.
The same approach could be used with schools. A study might randomly assign schools to implement a program for its new teachers, for example, or a professional development model for math teachers. A particularly interesting approach was used in a recent study of the Success for All program in which all schools were part of the “treatment” group but some were assigned to implement Success for All in grades K through 2 and others were assigned to implement Success for All in grades 3 through 5. For schools that were implementing the program in the lower grades, the study used the higher grades as a control group and vice versa. All schools receive the program, just not at all grade levels.
But studies need to be careful. In cluster trials, having lots of students no longer is a guarantee of being able to estimate program effects precisely. Why this is so is related to a statistical concept called, somewhat unhelpfully, the “intracluster correlation,” sometimes abbreviated as the ICC.
To understand the ICC, consider an admittedly contrived example in which a study had twenty pairs of identical twins, and twins are randomly assigned to treatment or control groups. So there are 10 pairs of twins in one group and 10 pairs in the other. Twins are a cluster.
But from a statistical perspective, the sample size is not really 20 individuals in each group. The problem is that the twins have (nearly) perfectly correlated outcomes (technically, their intracluster correlation is almost one). Whatever happens to one member of a twin pair is likely to happen to the other. The sample is much closer to being 10 than 20; each twin pair increases the “real” sample by one.
Getting back to real cases, a classroom creates a context in which students possibly are grouped related to their characteristics, and certainly interact with each other in a social and educational sense. So there is a degree to which students in a classroom are like twins and have correlated outcomes. And so with schools too, whose students typically live in neighborhoods and may share characteristics and interact socially and educationally.
As abstract as intra-cluster correlation may seem, it can make a huge difference in whether a study can measure program effects. For example, if a program is thought to reduce the proportion of students who, say, drop out of high school from 50 percent to 40 percent, and it’s implemented at a grade level with 200 students in that grade, and the intraclass correlation is “only” 5 percent, a study needs 9,600 students (48 schools!) to measure the effect with adequate statistical power. If the intraclass correlation is smaller, say 1 percent, the same study needs only 3,200 students (16 schools). Which means the study is four times smaller than the previous one and its costs will similarly be lower.
Various statistical packages can be used to account for clustering. The package used is less important than accounting for the correlation.
The exact ICC cannot be known until a study collects its data and can calculate it. But as the above example shows, knowing the ICC is useful for designing the study in the first place, which leaves the designer in a bit of a quandary. However, authors have calculated ICC estimates for math and reading test scores that can be used for designing studies (a good starting point is Hedges, L. V., and E. Hedberg. 2007. Interclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis 29(1): 60–87). A useful primer is Bloom et al. 2008, which also provides empirical examples of studies using cluster methods.