Lesson 7

Chi Square

Parametric and Non-Parametric Statistics

Most of the statistics we’ve learned so far—the mean, the standard deviation, the t- test, and the product moment correlation—belong to a category called parametric statistics. That’s because it is assumed the data used to compute them have certain parameters or meet certain conditions. One of these is that the variances are similar; another is that the sample is large enough to be representative of the universe from which it is drawn. We used examples of 30 or more cases when we worked on the mean, the t-test, and the product moment correlation because there is a general consensus among statisticians that this is the minimum-size sample to use with parametric tests. You should keep this in mind when using these tests in your practicum and in your own research.

But what do we do when we can’t meet these conditions? Happily, there’s another category of statistics, and you shouldn’t be surprised to learn that it’s called non-parametric statistics. We can do many of the same things with non-parametric statistics. They’re regarded as somewhat less powerful than parametric statistics, but they’re not to be looked down on. When conditions call for them, they are the things to use.

Chi Square

One of the most useful of the non-parametric statistics is chi square. We use it when our data consist of people distributed across categories, and we want to know whether that distribution is different from what we would expect by chance (or another set of expectations). We don’t have scores, we don’t have means. We just have numbers, or frequencies. In other words, we have nominal data.

For example, suppose we have the data in Table 14 that display the number of students who elect different majors, and we want to know whether those numbers differ from chance. In other words, are some majors selected more often than others, or is the selection pattern essentially random?

Table 14
Number of Students Selecting Different Majors


Pre-Med

Computer Sciences

English Literature


Education


Engineering


Total

50

85

25

60

80

300

The null hypothesis here, of course, is that there is no difference between this distribution of major selections from what would be expected by chance. So what chi square does is compare these numbers (the observed frequencies) with those that would be expected by chance (the expected frequencies).

The formula for chi square is:

Where:

is the value for chi square.

is the sum.

O is the observed frequency

E is the expected frequency.

The first question in doing the calculation is, how do we get the expected frequencies? That’s easy. If we are testing the observed frequencies (those in Table 14) against what we would expect by chance, since we have five categories of majors, we would expect one-fifth of the individuals to fall in each of the categories. One-fifth (20%) of 300 is 60. So if the selection of majors is largely a chance pattern, we would expect to find 60 people in each category.

Table 15 displays the observed and expected frequencies for each major, computes the difference between them (O–E), squares O–E ((O–E)2), divides the squares by the expected frequencies ((O–E)2/E), and sums those quantities to give us our , which is 39.17.

Table 15
Observed and Expected Frequencies for the Selection of Majors


Major

O (observed frequency)

E (expected frequency)


O–E


(O–E)2


(O–E)2/E

Pre-Med

50

60

-10

100

1.67

Computer Sciences

85

60

25

625

10.42

English Literature

25

60

-35

1225

20.42

Education

60

60

0

0

0.00

Engineering

80

60

20

400

6.67

Total

300

300

   

39.17

By now, you know the next step: determining if we can reject the null hypothesis. We do it the same way we did for the t-test and the correlation. We enter the chi square significance table (which I have handy, but you don’t) with our chi square value (39.17) and the appropriate degrees of freedom. For chi square, the degrees of freedom are equal to the number of rows minus one (R–1). In our case we have five rows, so df = 4.

Entering the chi square table with our result of 39.17 and df = 4, we find that we need a chi square value of 13.28 to reject the null hypothesis at the .01 level of confidence. We clearly have that, so we can say that the distribution of major selections is a not simply a chance pattern; or = 39.7 p <.01, df = 4.

Click here to go on to Lesson 8 on Summarizing the Steps and Moving On.