Multinomial Models

The GENMOD Procedure

Multinomial Models

This type of model applies to cases where an observation can fall into one of k categories. Binary data occurs in the special case where k=2. If there are m_i observations in a subpopulation i, then the probability distribution of the number falling into the k categories y_i = (y_i1, y_i2, ... y_ik) can be modeled by the multinomial distribution, defined in the "Response Probability Distributions" section, with $\sum_j y_{ij} = m_i$ . The multinomial model is an ordinal model if the categories have a natural order.

The GENMOD procedure orders the response categories for ordinal multinomial models from lowest to highest by default. This is different from the binomial distribution, where the response probability for the highest of the two categories is modeled. You can change the way GENMOD orders the response levels with the RORDER= option in the PROC GENMOD statement. The order that GENMOD uses is shown in the "Response Profiles" output table described in the section "Response Profile".

The GENMOD procedure supports only the ordinal multinomial model. If (p_i1, p_i2, ... p_ik) are the category probabilities, the cumulative category probabilities are modeled with the same link functions used for binomial data. Let $P_{ir} = \sum_{j=1}^r p_{ij}$ ,r = 1, 2, ... , k-1 be the cumulative category probabilities (note that P_ik = 1). The ordinal model is

$g(P_{ir}) = \mu_r + {x_{i}}'{{\beta}}{for} r = 1,2, ... k-1$

where $\mu_1, \mu_2, ... \mu_{k-1}$ are intercept terms that depend only on the categories and x_i is a vector of covariates that does not include an intercept term. The logit, probit, and complementary log-log link functions g are available. These are obtained by specifying the MODEL statement options DIST=MULTINOMIAL and LINK=CUMLOGIT (cumulative logit), LINK=CUMPROBIT (cumulative probit), or LINK=CUMCLL (cumulative complementary log-log). Alternatively,

$P_{ir} = {\rm F}(\mu_r + {x_{i}}'{{\beta}}) {for} r = 1,2, ... k-1$

where F = g^-1 is a cumulative distribution function for the logistic, normal, or extreme value distribution.

PROC GENMOD estimates the intercept parameters $\mu_1, \mu_2, ... \mu_{k-1}$ and regression parameters ${\beta}$ by maximum likelihood.

The subpopulations i are defined by constant values of the AGGREGATE= variable. This has no effect on the parameter estimates, but it does affect the deviance and Pearson chi-square statistics; it also affects parameter estimate standard errors if you specify the SCALE=DEVIANCE or SCALE=PEARSON options.

Chapter Contents
Previous
Next
Top