Statistical Significance and
The Type I and Type II ErrorsCertainty and Uncertainty—Universes and Samples
Why do we have to use statistical tests, anyway? When we have two groups with different means, why can’t we just say that one is higher than the other, and that’s it? The reason is that the difference between the means of the two groups may be due to chance, and if we were to make the comparison again, the difference might be turned around.
How can that be? The two main reasons are sampling and measurement error. The particular sample we have may not be representative of the universe from which it is drawn. Also, tests and measuring instruments are not perfect.
For example, suppose that within the next hour we could somehow magically measure the height of every adult man and woman in the world, and we found that the mean height of the men was 5’6", and the mean for the women was 5’3". Since we have measured the entire universe of adult men and women, those are the averages, not estimates of them based on samples. We don’t need to run a t-test to see if the 3" difference between the means is statistically significant. That is the difference.
But if—as is almost always the case in whatever we do—we have to use a sample, we have to account for the fact that the sample, no matter how carefully drawn, may not be representative of the universe. Usually it is, but sometimes it’s not.
A good way to understand this important point is to realize that if we were to take 100 random samples of a 1,000 people each, the means of those samples would form a normal curve (just like the ones we worked on in Lesson 2). In other words, the means of some of those samples would be as much (or more) than 3 standard deviations on either side of the collective mean of the 100,000 people.
When we take just one sample, which is what we usually have to work with, the chances are it’s close to the real mean, simply because most of the values are clustered close to the mean (remember, 68% of the values are within ± 1 standard deviation from the mean). But we can’t be sure. The sample we’re working with just might be one of those that’s lying out at the extremes of the normal curve.
That’s why we have tests of statistical significance. They can’t tell us for sure whether the means we’re comparing are close to the true mean, but they can give us a good estimate or probability of whether that’s the case.Scientific Knowledge and the Null Hypothesis
As you’ve probably realized by now, scientists and statisticians understand that error and uncertainty are inevitable, but they’re very uncomfortable with it. Thus, one of the basic tenets of science, which is reflected in statistics, is the requirement that nothing be admitted into the body of scientific knowledge unless we’re as sure as we can be that it’s true. In other words, there is a strong conservative bias in science and statistics. Scientists would rather be guilty of waiting until there’s more evidence to be sure than to accept a finding prematurely and be wrong. In statistics, this takes the form of what is called the "null hypothesis." Basically, the null hypothesis says that whenever you are, for example, setting out to compare the difference between two means, you begin with the assumption—indeed, the assertion—that there is no difference between the means. And in order to conclude that there is a difference, your task is to disprove the null hypothesis.
Levels of Significance
Now this leads to a very difficult decision. And to understand the difficulty, let’s first go back to the t-test of the two means we ran in Lesson 3. We found that, for that test, t = .222. In order to find out if the difference between the means is statistically significant (i.e., how likely it is that it is due to chance), we look up the value of t in one of the statistical significance table that are found in the appendices of all statistics texts. The t-test table we need is reproduced below.
t-Test Values Required to Reject the Null Hypthothesis at the .05 and .01 Levels of Confidence (Two-Tailed Test)
Degrees of Freedom (df) .05 .01
20 2.09 2.85
21 2.08 2.83
22 2.07 2.82
23 2.07 2.81
24 2.06 2.80
25 2.06 2.79
26 2.06 2.78
27 2.05 2.77
28 2.05 2.76
29 2.05 2.76
30 2.04 2.75
35 2.03 2.72
40 2.02 2.71
45 2.01 2.70
50 2.01 2.68
55 2.00 2.67
60 2.00 2.66
65 2.00 2.66
70 2.00 2.65
75 1.99 2.64
80 1.99 2.64
85 1.99 2.64
90 1.99 2.63
95 1.99 2.63
100 1.98 2.63
Infinity 1.96 2.58
In order to use this table, we enter it with our t value (.222) and something called "degrees of freedom." The degrees of freedom is simply n1–1+n2–1 or, in our case, 70. Note that there are two columns of t values, one labeled .05, and the other labeled .01. If we go down to the degrees of freedom nearest to ours, which would be 70, we find that both the .05 and the .01 t values are substantially larger than our .222. So we didn’t achieve a large enough t value to reject the null hypothesis, i.e., to be able to conclude that the difference wasn’t due to chance.
Why do we have two columns, one labeled .05 and the other .01? Because those are the two levels of significance commonly used in statistical analysis. The t values in the .05 column are likely to occur by chance 5 percent of the time, whereas the t values in the .01 column are likely to occur by chance only 1 percent of the time.
Type I and Type II Errors
The choice of what significance level to use (.05, .01, or lower or higher) is the difficult choice that you as the researcher must make. If you decide to accept the .05 level of confidence, which requires a smaller t value, you can more easily reject the null hypothesis and declare that there is a statistically significant difference between the means than if you select the .01 level, but you will be wrong 5 percent of the time. This is the Type I error.
On the other hand, if you select the .01 value, you will be wrong only 1 percent of the time. But since the .01 value requires a larger t value, you will less often be able to reject the null hypothesis and say that there is a statistically significant difference between the means when in fact that is the case. This is the Type II error. It is crucially important to an understanding of even basic statistics that we have a clear understanding of these two errors. If you spend a little time with Table 10, it will help you achieve this understanding.
Accepting and Rejecting Null Hypotheses and the Making of Type I and Type II Errors*
The null hypothesis is really true, i.e., there is not a real difference between the means of the two groups.
You accepted the null hypothesis when it is true, i.e., you concluded that there is not a real difference between the means of the two groups which, in fact, is the case. That was a good decision.
You rejected the null hypothesis when it is true, i.e., you concluded that there is a real difference between the means of the two groups when, in fact, there is not a difference. That was a bad decision. You made the Type I error.
The null hypothesis is really false, i.e., there is a real difference between the means of the two groups.
You accepted the null hypothesis when it is false, i.e., you concluded that there is not a real difference between the means of the two groups when in fact there is a real difference. That was a bad decision. You made the Type II error.
You rejected the null hypothesis when it is true, i.e., you concluded that there is a real difference between the means of the two groups which, in fact, is the case . That was a good decision.
Table 11 and the work we’ve done in this lesson make the mysteries of statistical significance and Type I and Type II errors transparently clear. When you’re reading a professional journal and you encounter a discussion of the difference between the means of two groups, and the authors conclude by saying, t = 2.64 p < .01, df = 70, two tailed test, you will immediately know that:
So, this knowledge is a major step forward in your journey to master basic statistics. And we’ve got a few more neat things to cover. Click onward to Lesson 5.