|
Chapter Contents |
Previous |
Next |
| Introduction to Categorical Data Analysis Procedures |
Suppose you take two simple random samples, fifty men and fifty women, and ask the same question as before. You are now sampling two different populations that may have different response probabilities. The data can be tabulated as shown in Table 5.2.
Table 5.2: Two-Way Contingency Table: Sex by Color| Favorite Color | ||||||
| Sex | Red | Blue | Green | Total | ||
| Male | 30 | 10 | 10 | 50 | ||
| Female | 20 | 10 | 20 | 50 | ||
| Total | 50 | 20 | 30 | 100 | ||
Note that the row marginal totals (50, 50) of the contingency table are fixed by the sampling design, but the column marginal totals (50, 20, 30) are random. There are six probabilities of interest for this table, and they are estimated by the sample proportions
| Favorite Color | ||||||
| Sex | Red | Blue | Green | Total | ||
| Male | 0.60 | 0.20 | 0. 20 | 1.00 | ||
| Female | 0.40 | 0. 20 | 0.40 | 1.00 | ||
The probability distribution of the six frequencies is the product multinomial distribution

Stratified simple random sampling is the type of sampling required by PROC CATMOD when there is more than one population. PROC CATMOD uses the product multinomial distribution to estimate a probability vector and its covariance matrix. If the sample sizes are sufficiently large, then the probability vector is approximately normally distributed as a result of central limit theory, and PROC CATMOD uses this result to compute appropriate test statistics for the specified statistical model. The statistics are known as Wald statistics, and they are approximately distributed as chi-square when the null hypothesis is true.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.