Chapter Contents |
Previous |
Next |
The FREQ Procedure |
For one-way frequency tables, PROC FREQ performs a chi-square goodness-of-fit test when you specify the CHISQ option. The other chi-square tests and statistics described in this section are defined only for two-way tables and so are not computed for one-way frequency tables.
All the two-way test statistics described in this section test the null hypothesis of no association between the row variable and the column variable. When the sample size n is large, these test statistics are distributed approximately as chi-square when the null hypothesis is true. When the sample size is not large, exact tests may be useful. PROC FREQ computes exact tests for the following chi-square statistics when you specify the corresponding option in the EXACT statement: Pearson chi-square, likelihood-ratio chi-square, and Mantel-Haenszel chi-square. See the section "Exact Statistics" for more information.
Note that the Mantel-Haenszel chi-square statistic is appropriate only when both variables lie on an ordinal scale. The other chi-square tests and statistics in this section are appropriate for either nominal or ordinal variables. The following sections give the formulas that PROC FREQ uses to compute the chi-square tests and statistics. For further information on the formulas and on the applicability of each statistic, refer to Agresti (1996), Stokes, Davis, and Koch (1995), and the other references cited for each statistic.
In the test for equal proportions, which is the default for the CHISQ option, the null hypothesis specifies equal proportions of the total sample size for each class. Under this null hypothesis, the expected frequency for each class equals the total sample size divided by the number of classes,
Under the null hypothesis (of equal proportions, specified frequencies, or specified proportions), this test statistic has an asymptotic chi-square distribution, with C - 1 degrees of freedom. In addition to the asymptotic test, PROC FREQ computes the exact one-way chi-square test when you specify the CHISQ option in the EXACT statement.
When the row and column variables are independent, Q_{P} has an asymptotic chi-square distribution with (R-1)(C-1) degrees of freedom. For large values of Q_{P}, this test rejects the null hypothesis in favor of the alternative hypothesis of general association. In addition to the asymptotic test, PROC FREQ computes the exact chi-square test when you specify the PCHI or CHISQ option in the EXACT statement.
For a 2 ×2 table, the Pearson chi-square is also appropriate for testing the equality of two binomial proportions or, for R ×2 and 2 ×C tables, the homogeneity of proportions. Refer to Fienberg (1980).
When the row and column variables are independent, G^{2} has an asymptotic chi-square distribution with (R-1)(C-1) degrees of freedom. In addition to the asymptotic test, PROC FREQ computes the exact test when you specify the LRCHI or CHISQ option in the EXACT statement.
The statistic is computed as
Under the null hypothesis of independence, Q_{C} has an asymptotic chi-square distribution with (R-1)(C-1) degrees of freedom.
where r^{2} is the Pearson correlation between the row variable and the column variable. For a description of the Pearson correlation, see the "Pearson Correlation Coefficient" section. The Pearson correlation and, thus, the Mantel-Haenszel chi-square statistic use the scores that you specify in the SCORES= option in the TABLES statement.
Under the null hypothesis of no association, Q_{MH} has an asymptotic chi-square distribution with 1 degree of freedom. In addition to the asymptotic test, PROC FREQ computes the exact test when you specify the MHCHI or CHISQ option in the EXACT statement.
Refer to Mantel and Haenszel (1959) and Landis, Heyman, and Koch (1978).
For 2 ×2 tables, Fisher's exact test is the probability of observing a table that gives at least as much evidence of association as the one actually observed, given that the null hypothesis is true. The row and column margins are assumed to be fixed. The hypergeometric probability, p, of every possible table is computed, and the p-value is defined as
For a two-sided alternative hypothesis, A is the set of tables with p less than or equal to the probability of the observed table. A small two-sided p-value supports the alternative hypothesis of association between the row and column variables.
One-sided tests are defined in terms of the frequency of the cell in the first row and first column (the (1,1) cell). For a left-sided alternative hypothesis, A is the set of tables where the frequency in the (1,1) cell is less than or equal to that of the observed table. A small left-sided p-value supports the alternative hypothesis that the probability of an observation being in the first cell is less than expected under the null hypothesis of independent row and column variables.
Similarly, for a right-sided alternative hypothesis, A is the set of tables where the frequency in the (1,1) cell is greater than or equal to that of the observed table. A small right-sided p-value supports the alternative that the probability of an observation being in the first cell is greater than expected under the null hypothesis.
Because the (1,1) cell frequency completely determines the 2 ×2 table when the marginal row and column sums are fixed, these one-sided alternatives can be equivalently stated in terms of other cell probabilities or ratios of cell probabilities. The left-sided alternative is equivalent to an odds ratio greater than 1, where the odds ratio equals (n_{11} n_{22} / n_{12} n_{21}). Additionally, the left-sided alternative is equivalent to the column 1 risk for row 1 being less than the column 1 risk for row 2, p_{1|1} < p_{1|2}. Similarly, the right-sided alternative is equivalent to the column 1 risk for row 1 being greater than the column 1 risk for row 2, p_{1|1} > p_{1|2}. Refer to Agresti (1996).
R × C Tables Fisher's exact test was extended to general R ×C tables by Freeman and Halton (1951), and this test is also known as the Freeman-Halton test. For R ×C tables, the two-sided p-value is defined the same as it is for 2 ×2 tables. A is the set of all tables with p less than or equal to the probability of the observed table. A small p-value supports the alternative hypothesis of association between the row and column variables. For R ×C tables, Fisher's exact test is inherently two-sided. The alternative hypothesis is defined only in terms of general, and not linear, association. Therefore, PROC FREQ does not compute right-sided or left-sided p-values for general R ×C tables.
For R ×C tables, PROC FREQ computes Fisher's exact test using the network algorithm of Mehta and Patel (1983), which provides a faster and more efficient solution than direct enumeration. See the section "Exact Statistics" for more details.
Refer to Fleiss (1981, pp. 59 -60).
Refer to Kendall and Stuart (1979, p. 588).
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.