Lesson 2
The Standard Deviation and the Normal Curve
A Measure of Dispersion: The Standard Deviation
For various important reasons we'll see as we get further into this course, we often want to know not only what the central tendency is in a set of scores or values (i.e., the mean, the median, or the mode), we also want to know how bunched up or spread out the scores are. The most widely used indicator of dispersion is the standard deviation which, in a nutshell, is based on the deviation of each score from the mean.
To illustrate, compare the distribution of test scores in Figures 4 and 5. The first is flat and spread out, while the second is concentrated and bunched up closely around the mean.
Figure 4
Graphic Display of Flat or Spread-Out Score Distribution
Figure 5
Display of a Narrow or Concentrated Distribution
Note that he mean and median of these two quite different distributions are the same ( = 150, Mdn = 150), so simply calculating and reporting those two measures of central tendency would fail to reveal how different the dispersion of scores is between the two groups. But we can do this by calculating the standard deviation.
The standard deviation provides us with a measure of just how spread out the scores are: a high standard deviation means the scores are widely spread; a low standard deviation means they're bunched up closely on either side of the mean.
We'll now calculate the standard deviation for both these distributions. The formula for the standard deviation is:
Where:
(little sigma) is the standard deviation.
d^{2 } is a score's deviation from the mean squared.
is the number of cases.
The numbers we need to calculate the standard deviation for Figure 4, the flat distribution, are in Table 6.
Table 6
Data for Figure 4—the Flat Distribution
A |
B |
C |
D |
E |
Test Score (X) |
Frequency (f) |
X–Mean (d) |
fd |
fd^{2} |
100 |
8 |
50 |
400 |
20,000 |
110 |
13 |
40 |
520 |
20,800 |
120 |
17 |
30 |
510 |
15,300 |
130 |
20 |
20 |
400 |
8,000 |
140 |
21 |
10 |
210 |
2,100 |
150 |
22 |
0 |
0 |
0 |
160 |
21 |
-10 |
-210 |
2,100 |
170 |
20 |
-20 |
-400 |
8,000 |
180 |
17 |
-30 |
-510 |
15,300 |
190 |
13 |
-40 |
-520 |
20,800 |
200 |
8 |
-50 |
-400 |
20,000 |
SUM |
180 |
132,400 |
Column A displays the test scores (X).
Column B shows how many people got each test score (f).
Column C is the test score minus the mean (X minus the mean or d).
Column D is the sum of the deviations in column C (fd).
Column E contains the squares of all the deviations.
Of course, to get the deviation of each score from the mean (column C), we have to calculate the mean, and you already know how to do that. We now have what we need to calculate the standard deviation for the flat distribution in Figure 4:
or
You can do the last part of this calculation, the square root of 132,400/180 (which is 736) by using the square-root button on your little hand calculator.
Now let's compute the standard deviation for the data in Figure 5. The data are in Table 7, and you follow the same steps we've just completed.
Table 7
Example of a Narrow or Concentrated Distribution
A |
B |
C |
D |
E |
Test Score (X) |
Frequency (f) |
X - Mean (d) |
fd |
fd^{2} |
100 |
0 |
50 |
0 |
0 |
110 |
0 |
40 |
0 |
0 |
120 |
0 |
30 |
0 |
0 |
130 |
10 |
20 |
200 |
4,000 |
140 |
45 |
10 |
450 |
4,500 |
150 |
70 |
0 |
0 |
0 |
160 |
45 |
-10 |
-450 |
4,500 |
170 |
10 |
-20 |
-200 |
4,000 |
180 |
0 |
-30 |
0 |
0 |
190 |
0 |
-40 |
0 |
0 |
200 |
0 |
-50 |
0 |
0 |
SUM |
180 |
17,000 |
or
The two standard deviations provide a statistical indication of the how different the distributions are: 27 for the spread-out distribution and 10 for the bunched-up distribution.
So once we know the mean and median, why do we need to know the standard deviation? What use is it?
The standard deviation is important because, regardless of the mean, it makes a great deal of difference whether the distribution is spread out over a broad range or bunched up closely around the mean. For example, suppose you have two classes whose mean reading scores are the same. With only that information, you would be inclined to teach the two classes in the same way. But suppose you discover that the standard deviation of one of the classes is 27 and the other is 10, as in the examples we just finished working with. That means that in the first class (the one where 27), you have many students throughout the entire range of performance. You'll need to have teaching strategies for both the gifted and the challenged. But in the second class (the one where = 10), you don't have any gifted or challenged students. They're all average, and your teaching strategy will be entirely different.
The Normal Curve
Before we leave the standard deviation, it's a good time to learn a little more about the normal curve. We'll be coming back to it later.
First, why is it called the normal curve? The reason is that so many things in life are distributed in the shape of this curve: IQ, strength, height, weight, musical ability, resistance to disease, and so on. Not everything is normally distributed, but most things are. Thus the term normal curve.
In Figure 6, we have a set of scores which are normally distributed. The range is from 0 to 200, the mean and median are 100, and the standard deviation is 20. In a normal curve, the standard deviation indicates precisely how the scores are distributed. Note that the percentage of scores is marked off by standard deviations on either side of the mean. In the range between 80 and 20 (that’s one standard deviation on either side of the mean), there are 68.26% of the cases. In other words, in a normal distribution, roughly two thirds of the scores lie between one standard deviation on either side of the mean. If we go out to two standard deviations on either side of the mean, we will include 95.44% of the scores; and if we go out three standard deviations, that will encompass 98.74% of the scores; and so on.
Another way to think about this is to realize that in this distribution, if you have a score that’s within one standard deviation of the mean, i.e., between 80 and 120, that’s pretty average—two thirds of the people are concentrated in that range. But if you have a score that’s two or three standard deviations away from the mean, that is clearly a deviant score, i.e., very high or very low. Only a small percent of the cases lie that far out from the mean.
This is valuable to understand in its own right, and will become useful when we take up determining the significance of difference between means—which we’re going to do next in Lesson 3.
Figure 6
Normal Curve Showing the Percent of Cases Lying Within 1, 2, and 3 Standard Deviations From the Mean