Chapter 16. Chi-squared

1. Consider the following table:

Preferred flavor of Ice Cream
Sex Chocolate Vanilla Strawberry Coffee Total
male 25 15 15 29 84
female 35 20 10 20 85
Total 60 35 25 49 169

2. If you were going to perform a chi-square test on this table, what would your degrees of freedom be?

You calculate degrees of freedom by multiplying the number of rows minus one times the number of columns minus one: df = (rows - 1) × (cols - 1) = (4 - 1) × (2 - 1) = 3

3. What would the expected number be for the male vanilla cell? (How many men would you expect to prefer vanilla?)

There are two ways to calculate the expecteds. For both, you need to add a few more numbers to the table. You need the marginal counts and percentages, often called "the marginals." The counts are already present in the "Totals" row and column. The marginal percentages have been added in the table below. You can see that 62.5% of all people passed the test and 50% took the drug.

The first way to calculate the expecteds uses the marginal percentages. If sex is not related to flavor preference, you would expect the same percentages of males to prefer the same flavors as females. Since overall 20.71% prefer vanilla, you would expect 20.71% of males to prefer vanilla. Now, 20.71% of 84 people is 17.3964, so this is how many males you would expect to prefer vanilla. You would also expect 20.71% of the females to prefer vanilla, which is 17.6035. So you'd expect 17.3964 males and 17.6035 females to prefer vanilla.

Preferred flavor of Ice Cream
Sex Chocolate Vanilla Strawberry Coffee Total
male 25

15

[17.3964]

15 29

84

49.704%

female 35

20

[17.6035]

10 20

85

50.296%

Total

60

35.503%

35

20.710%

25

14.793%

49

28.994%

169

The second method of calculating the expecteds is a bit more straightforward and is likely to result in slightly more accurate results because it reduces rounding error. To calculate the expecteds for the male vanilla cell, multiply the number of cases in the male row by the number in the vanilla column and divide the result by the total number in the table. Thus 84 × 35 ÷ 169 = 17.3964, the number of males you would expect to prefer vanilla.

4. What is the contribution of the female strawberry cell to the value of chi-squared?

Each cell contributes an amount to chi-squared:

The expected for the female strawberry cell is 85 × 25 ÷ 169 = 12.57396. Therefore, the female strawberry cell's contribution to chi-squared is:

5. What is the value of chi-squared for this table?

This table shows the observed and expected counts for each cell in the table.

obs

exp

Preferred flavor of Ice Cream
Sex Chocolate Vanilla Strawberry Coffee
male

25

[29.82248]

15

[17.39645]

15

12.42604]

29

[24.35503]

female

35

[30.17752]

20

[17.60355]

10

[12.57396]

20

[24.64497]

This table shows the difference between observed and expected (the top number in each cell) and the cell's contribution to chi-squared (the bottom number in each cell). The cell contributions are calculated according to the formula in the answer to question 4.

obs-exp

[cell contrib]

Preferred flavor of Ice Cream
Sex Chocolate Vanilla Strawberry Coffee
male

-4.82248

[.77983]

-2.39645

[.33012]

2.57396

[.53318]

4.64497

[.88588]

female

4.82248

[.77065]

2.39645

[.32624]

-2.57396

[.52691]

-4.64497

[.87546]

The value of chi-squared for the table is the sum of the values in each cell:

.77983 + .33012 + .53318 + .88588 + .77065 + .32624 + .52691 + .87546 = 5.02827

6. What is the null hypothesis?

HØ : "Sex is not related to preference of ice cream flavor."

HALT : "Sex is related to preference of ice cream flavor."

7. Is the row variable independent of the column variable?

The 95% critical value for chi-squared with 3 degrees of freedom is 7.815. Since chi-squared for this data, 5.02827, is less than the critical value of 7.815, you fail to reject the null hypothesis. The data shown in this table could have come from a population in which sex is not related to preferred flavor of ice cream. In other words, the data does not convince you that sex is related to preferred flavor. So the answer to the question is "Yes, the row variable is independent of the column variable."

8. Do you reject the null hypothesis?

Since chi-squared for this data, 5.02827, is less than the critical value of 7.815, you fail to reject the null hypothesis.

9. How certain are you of your answer?

Critical values of chi-squared
df Probability
.3 .2 .1 .05 .02
1 1.074 1.642 2.706 3.841 5.412
2 2.408 3.219 4.605 5.991 7.824
3 3.665 4.642 6.251 7.815 9.837
4 4.878 5.989 7.779 9.488 11.668

Because your value of chi-squared is between 4.624 and 6.251, the probability of getting a value of chi-squared as large as the one you obtained would be between 10% and 20%, even if the row variable is independent of the column variable.

10. How do you know that is how certain you are?

You know because you have examined the table of critical values of chi-squared and you found that a value of chi-squared of 5.02827 with three degrees of freedom has a probability between 0.1 and 0.2.

11. Which method would be more appropriate for interpreting this table - percentage down, compare across or percentage across, compare down? Why?

In this table, the independent variable is sex. Since you want to compare men with women, you would use percentage across, compare down. In other words, you would work with row percents and you would compare the percentages of men and women that fall into a column.

count

row %

Preferred flavor of Ice Cream
Sex Chocolate Vanilla Strawberry Coffee Total
male

25

29.762%

15

17.857%

15

17.857%

29

35.524%

84

49.704%

female

35

41.176%

20

23.529%

10

11.765%

20

23.529%

85

50.296%

Total

60

35.503%

35

20.710%

25

14.793%

49

28.994%

169

You would say: 41.176% of women prefer chocolate compared to only 29.762% of men. 35.524% of men prefer coffee compared to only 23.529% of women. 17.857% of men prefer strawberry, compared to only 11.765% of women. For women, the most preferred flavour was chocolate, while for men, the most preferred flavour was coffee. Overall, the least popular flavor was strawberry.

12. In a large survey done in the USA on a regular basis (the General Social Survey), people were asked "Taken all together, how would you say things are these days -- would you say that you are very happy, pretty happy, or not too happy?" The results of this question are shown here:

count Reported Happiness vs. Marital Status
Marital Status
Happiness Married Divorced Single Total
Very 347 97 58 502
Less than Very 467 256 220 943
Total 814 353 278 1445

13. What is the null hypothesis?

HØ : "People's marital status is not related to how happy they say they are."

14. If you were going to perform a chi-square test on this table, what would your degrees of freedom be?

The degrees of freedom would be (3 - 1) × (2 - 1) = 2.

15. What is the alternate hypothesis?

HALT : "People's marital status is related to how happy they say they are."

16. What is the value of chi-square for this table?

obs

exp

cell contrib

Reported Happiness vs. Marital Status
Marital Status
Happiness Married Divorced Single
Very

347

282.7875

14.5807

97

122.6339

5.3582

58

96.57854

15.4103

Less than

Very

467

531.2125

7.76194

256

230.3661

2.8524

220

181.4214

8.20358

chi-squared = 14.5807 + 5.3582 + 15.4103 + 7.76194 + 2.8524 + 8.20358 = 54.16713

17. Do you reject the null hypothesis?

Yes you do. The 95% critical value of chi-squared for 2 degrees of freedom is 5.991. The value you obtained is much larger than this.

18. How certain are you of your answer?

The probability of getting a value of chi-squared as large as the one you see for this table is less than .001 when the null hypothesis is true. The chance of making a Type I error if you reject the null hypothesis is less than one in a thousand (0.1%).



Feb 25, 2005