Lesson 1

Measures of Central Tendency:
The Mean, Median, and Mode

One of the most basic purposes of statistics is simply to enable us to make sense of large numbers. For example, if you want to know how the students in your school are doing in the statewide achievement test, and somebody gives you a list of all 600 of their scores, that’s useless. This everyday problem is even more obvious and staggering when you’re dealing, let’s say, with the population data for the nation.

We’ve got to be able to consolidate and synthesize large numbers to reveal their collective characteristics and interrelationships, and transform them from an incomprehensible mass to a set of useful and enlightening indicators.

The Mean

One of the most useful and widely used techniques for doing this—one which you already know—is the average or,  as it is know in statistics, the mean.  And you know how to calculate the mean: you simply add up a set of scores and divide by the number of scores. Thus we have our first and perhaps the most basic statistical formula:

Where:

(sometimes call the X-bar) is the symbol for the mean.

(the Greek letter sigma) is the symbol for summation.

X is the symbol for the scores.

N is the symbol for the number of scores.

So this formula simply says you get the mean by summing up all the scores and dividing the total by the number of scores—the old average, which in this case we’re all familiar with, so it’s a good place to begin.

This is pretty simple when you have only a few numbers. For example, if you have just 6 numbers (3, 9, 10, 8, 6, and 5), you insert them into the formula for the mean, and do the math:

But we usually have many more numbers to deal with, so let’s do a couple examples where the numbers are larger, and show how the calculations should be done. In our first example, we’re going to compute the mean salary of 36 people. Column A of Table 1 show the salaries (ranging from $20K to $70K), and column B shows how many people earned each of the salaries.

Table 1

Example 1 of Method for Computing the Mean

A

B

C

Salary (X)

Frequency (f)

fX

$20k

1

20

$25K

2

50

$30K

3

90

$35K

4

140

$40K

5

200

$45K

6

270

$50K

5

250

$55K

4

220

$60K

3

180

$65K

2

130

$70K

1

70

Sum

36

1,620


To get the for our formula, we multiply the number of people in each salary category by the salary for that category (e.g., 1 x 20, 2 x 25, etc.), and then total those numbers (the ones in column C). Thus we have:

And this is how the distribution of these salaries looks:

Figure 1
Distribution of Example 1 Salaries


The scores in this distribution are said to be normally distributed, i.e., clustered around a central value, with decreasing numbers of cases as you move to the extreme ends of the range. Thus the term normal curve.

So, computing the mean is pretty simple. Piece of cake, right? Not so fast.

In our second example, let’s look what happens if we change just six people’s salary in Table 1. Let’s suppose that the three people who made $60K actually made $220K, and that the two who made $65K made $205K, and the one person who made $70K made $210K. The revised salary table is the same except for these changes.

Table 2
Example 2 of Method for Computing the Mean

A

B

C

Salary (X)

Frequency (f)

fX

$20k

1

20

$25K

2

50

$30K

3

90

$35K

4

140

$40K

5

200

$45K

6

270

$50K

5

250

$55K

4

220

$200K

3

600

$205K

2

410

$210K

1

210

Sum

36

2,460

But before we recompute the mean, let’s look at how different the distribution looks.

Figure 2
Distribution of Example-2 Salaries

Now, using the revised numbers in Table 2, we compute the mean as follows:

What this shows is that changing the salaries of just six individuals to extreme values greatly affects the mean. In this case, it raised the mean from $45K to $68.3K (an increase of 52%), even though all the other scores remained the same. In fact, the mean is a figure that no person in the group has—hardly a figure we would think of as "average" for the group.

The important lesson here is that the mean is intended to be a measure of central tendency, but it works usefully as such only if the data on which it is based are more or less normally distributed (like in Figure 1). The presence of extreme scores distorts the mean, and, in this case, gives us a mean salary ($68.3K) that is not a very good indication of the "average" salary of this group of 36 individuals.

So if we know or suspect that our data may have some extreme scores that would distort the mean, what measure can we use to give us a better measure of central tendency? One such measure is the median, and we move on to learn about that now.

The Median

If your data are normally distributed (like those in Figure 1), the preferred measure of central tendency is the mean. However, if your data are not normally distributed (like those in Figure 2), the median is a better measure of central tendency, for reasons we’ll see in a moment.

The median is the point in the distribution above which and below which 50% of the scores lie. In other words, if we list the scores in order from highest to lowest (or lowest to highest) and find the middle-most score, that’s the median.

For example, suppose we have the following scores: 2, 12, 4, 11, 3, 7, 10, 5, 9, 6. The next step is to array them in order from lowest to highest.

2
3
4
5
6
7
9
10
11
12

Since we have 10 scores, and 50% of 10 is 5, we want the point above which and below which there are five scores. Careful. If you count up from the bottom, you might think the median is 6. But that’s not right because there are 4 scores below 6 and 5 above it. So how do we deal with that problem? We deal with it by understanding that in statistics, a measurement or a score is regarded not as a point but as an interval ranging from half a unit below to half a unit above the value. So in this case, the actual midpoint or median of this distribution—the point above which and below which 50% of the scores lie—is 6.5

As we saw with the mean, when we have only a few numbers, it’s pretty simple. But how do we find the median when we have larger numbers and more than one person with the same score? It’s not difficult. Let’s use the salary data in Table 1.

Table 3
Example 1 of Method for Computing the Median

Salary

Range

Frequency

$20K

$19.5K-20.5K

1

$25K

$24.5K-25.5K

2

$30K

$29.5K-30.5K

3

$35K

$34.5K-35.5K

4

$40K

$39.5K-40.5K

5

$45K

$44.5K-45.5K

6

$50K

$49.5K-50.5K

5

$55K

$54.5K-55.5K

4

$60K

$59.5K-60.5K

3

$65K

$64.5K-65.5K

2

$70K

$69.5K-70.5K

1

Sum

 

36

The salaries are already in order from lowest to highest, so the next step in finding the median is to determine how many individuals (ratings, scores, or whatever) we have. Those are shown in the frequency column, and the total is 36. So our N = 36, and we want to find the salary point above which and below which 50%, or 18, of the individuals fall. If we count up from the bottom through the $40K level, we have 15, and we need three more. But if we include the $45K level (in which there are 6), we have 21, three more than we need. Thus, we need 3, or 50%, of the 6 cases in the $45K category. We add this value (.5) to the lower limit of the interval in which we know the median lies ($44.5K-$45.5K), and this gives us value of $45K.

In this case, the mean and the median are the same—as they always are in normal distributions. So in situations like this, the mean is the preferred measure.

But things aren’t always so neat and tidy. Let’s now compute the median for the salary data in Table 2, which we know (from Figure 2) are not normally distributed.

Table 4
Example 2 of Method for Computing the Median

Salary

Range

Frequency

$20k

$19.5K-20.5K

1

$25K

$24.5K-25.5K

2

$30K

$29.5K-30.5K

3

$35K

$34.5K-35.5K

4

$40K

$39.5K-40.5K

5

$45K

$44.5K-45.5K

6

$50K

$49.5K-50.5K

5

$55K

$54.5K-55.5K

4

$200K

$199.5K-200.5K

3

$205K

$204.5K-205.5K

2

$210K

$209.5K-210.5K

1

Sum

 

36

The N is the same (36), so we go through exactly the same calculations we did for the data in Table 3. When we do that (count up from the bottom, find that we need half the cases in the $45K category to get 50% (18) of the total, and do so by adding .5 to the lower limit of that category), incredibly we get exactly the same result ($45K) we did with the data in Table 3. In other words, those six extreme cases (the six whose salaries changed from $60K, $65K, and $70K to $200K, $205K, and $210K) don’t affect the median even though they made a big change in the mean. They are still above the midpoint, and it doesn’t matter how much above it in the calculation of the median.

This example illustrates dramatically what the median is and why it’s a better measure of central tendency than the mean when we have extreme scores.

We’ve done the calculations for the median in a simple, descriptive way (arraying the scores from high to low, counting up to the mid-category, dividing it as necessary, etc.), but just so you won’t feel slighted, here is the statistical formula for doing what we’ve just done.

 

Where:

Mdn is the median.

L is the lower limit of the interval containing the median.

N is the total number of scores.

is the sum of the frequencies or number of scores up to the interval containing the median.

fw is the frequency or number of scores within the interval containing the median.

i is the size or range of the interval.

The Mode

The third and last of the measures of central tendency we’ll be dealing with in this course is the mode. It’s very simple: The mode is the most frequently occurring score or value. In our case (see Figures 1 and 2), that value is 45K. But sometimes we may have odd distributions in which there may be two peaks. Even if the peaks are not exactly equal, they’re referred to as bi-modal distributions.

Let’s assume we have such a bi-modal distribution of salaries as shown in Table 5 and Figure 3.

Table 5
Bi-Modal Distribution of Salaries

A

B

C

Salary (X)

Frequency (f)

fX

$20K

1

20

$25K

3

75

$30K

4

120

$35K

6

210

$40K

3

120

$45K

1

45

$50K

3

150

$55K

5

275

$60K

6

360

$65K

3

195

$70K

1

70

Sum

36

1,640


Figure 3
Example of a Bi-Modal Distribution



Before we talk about the mode, using the formulas and calculation procedures you’ve just learned, calculate the mean and median for the salaries in Table 5 (the fx and the   data are in Column C).

When you look at this distribution of salaries, as shown graphically in Figure 3, it’s hard to discern any central tendency. The mean (which you just calculated) is $45K, which only one person earns, and the median is also $45K, which, while it’s the middle-most value (50% of the cases are above and below it), certainly doesn’t give us a meaningful indication of the central tendency in this distribution—because there isn’t any.

Therefore, the most informative general statement we can make about this distribution is to say the it is bi-modal.

You now know the three principal measures of central tendency—the mean, the median, and the mode—when they should be used, and how to calculate them, so we now move on to the other side of the central-tendency coin: dispersion.

Click here to go on to Lesson 2.