The Effect Test
Take another look at Table 10 in Lesson 4 which provides the significance levels for the t-test. You probably noticed that the size of the t value needed to reject the null hypothesis (and enable you to declare that there is a statistically significant difference between two means) is dependent on the size of the samples on which the means are based. With df = 20, you need a t value of 2.09 to reach the .05 level of significance; but with df = 100, you need a t value of only 1.98.
In other words, if you have small Ns, you will need a large difference between the means to achieve statistical significance; but if you have very large Ns, you will need only a very small difference to be able to declare that the difference between the means is statistically significant.
So why is that of more than technical interest? Because we donít want to mistake statistical significance for educational significance. Suppose you are comparing the mean reading scores of students in your traditional program with those in a new reading program. There are 500 students in each program, and at the end of the year there is a 3 point difference favoring the new program, and that 3 point difference is statistically significant beyond the .001 level of confidence. The proponents of the new program are likely to cite that finding as clear research evidence of the superiority of the new program and call on you, as the superintendent, to junk the traditional program, even though the new program is substantially more costly.
But you should be wary of that recommendation. Why? Because the mean difference in reading scores, even though itís statistically significant, is very small. Is a 3 point difference likely to have any practical significance, or even be observable? Probably not. Even if the difference were a few points greater, would such a difference justify the expenditure of substantially more funds? Probably not.
It turns out that statisticians have developed a test that is intended to give some help when confronting the question of whether a difference between two means is of practical consequence. Itís called the Effect Test.
The formula for the Effect Test is:
E is the effect size.
is the mean of Group 1.
is the mean of Group 2.
is the standard deviation of Group 1.
is the standard deviation of Group 2.
As you can see, the formula is simply dividing the difference between the two means by the average of the score variability in the two groups.
There is a general consensus that an effect size of .33 or greater indicates that the difference has practical meaning or significance.
Letís do an example.
We have two groups with mean reading test scores of 188 and 185 and standard deviations of 30 and 32. N = 500 for both groups, and the difference between the means is statistically significant. We plug the numbers into the Effect Test formula as follows:
The effect size does not reach the .33 level, so the 3 point difference between the means would not be regarded as practically consequential, even though itís statistically significant.
But suppose the two means are 193 and 182, and the Ns and standard deviations are the same. Then we have:
The 11 point difference between the means (with the associated score variability as reflected in the standard deviations) exceeds the .33 threshold for practical significance. So in this case, we would be justified in saying that the difference between the two groups is not only statistically significant, it can also be regarded as having some practical educational meaning.
However, in the final analysis, you, as the experienced educator and administrator, must make the judgment about practical meaning. Many times you will be presented with mean differences that are large by any practical standard, but because of small Ns or large variances, theyíre not statistically significant. In those cases, the judgment is fairly easy: You would be on very soft ground making policy and budgetary decisions based on differences that are not statistically significant.
But the other case is more difficult. If you have a mean difference that is both statistically significant and practically significant as indicated by the effect size, you still have to be the judge of whether that difference justifies changing programs, spending more money, hiring or firing staff, and so on.
The new knowledge you now have about how to determine statistical and practical significance adds greatly to your ability to make decisions about the effectiveness of educational programs and the formulation of educational policies, but there are no automatic answers. You, as the responsible administrator, must bring your experience to bear in making the final decision.
Click here to go on to Lesson 6 on correlation.