Example 46.3: Probit-Normal Model with Ordinal Data

The NLMIXED Procedure

Example 46.3: Probit-Normal Model with Ordinal Data

The data for this example are from Ezzet and Whitehead (1991), who describe a crossover experiment on two groups of patients using two different inhaler devices (A and B). Patients from group 1 used device A for one week and then device B for another week. Patients from group 2 used the devices in reverse order. The data entered as a SAS data set are as follows.

   data inhaler;
      input clarity group time freq;
      gt = group*time;
      sub = floor((_n_+1)/2);
      datalines;
   1 0 0 59
   1 0 1 59
   1 0 0 35
   2 0 1 35
   1 0 0  3
   3 0 1  3
   1 0 0  2
   4 0 1  2
   2 0 0 11
   1 0 1 11
   2 0 0 27
   2 0 1 27
   2 0 0  2
   3 0 1  2
   2 0 0  1
   4 0 1  1
   4 0 0  1
   1 0 1  1
   4 0 0  1
   2 0 1  1
   1 1 0 63
   1 1 1 63
   1 1 0 13
   2 1 1 13
   2 1 0 40
   1 1 1 40
   2 1 0 15
   2 1 1 15
   3 1 0  7
   1 1 1  7
   3 1 0  2
   2 1 1  2
   3 1 0  1
   3 1 1  1
   4 1 0  2
   1 1 1  2
   4 1 0  1
   3 1 1  1
   run;

The response measurement, CLARITY, is the patients' assessment on the clarity of the leaflet instructions for the devices. The CLARITY variable is on an ordinal scale, with 1=easy, 2=only clear after rereading, 3=not very clear, and 4=confusing. The GROUP variable indicates the treatment group and the TIME variable indicates the time of measurement. The FREQ variable indicates the number of patients with exactly the same responses. A variable GT is created to indicate a group by time interaction, and a variable SUB is created to indicate patients.

As in the previous example and in Hedeker and Gibbons (1994), assume an underlying latent continuous variable, here with the form

$y_{ij} = \beta_0 + \beta_1 g_i + \beta_2 t_j + \beta_3 g_i t_j + u_i + e_{ij}$

where i indexes patient and j indexes the time period, g_i indicates groups, t_j indicates time, u_i is a patient-level normal random effect, and e_ij are iid normal errors. The $\beta$ s are unknown coefficients to be estimated.

Instead of observing y_ij, though, you observe only whether it falls in one of the four intervals: $(-\infty,0)$ , (0,I1), (I1,I1+I2), or $(I1+I2,\infty)$ , where I1 and I2 are both positive. The resulting category is the value assigned to the CLARITY variable.

The following code sets up and fits this ordinal probit model:

   proc nlmixed data=inhaler corr ecorr;
      parms b0=0 b1=0 b2=0 b3=0 sd=1 i1=1 i2=1;
      bounds i1 > 0, i2 > 0;
      eta = b0 + b1*group + b2*time + b3*gt + u;
      if (clarity=1) then z = probnorm(-eta);
      else if (clarity=2) then 
         z = probnorm(i1-eta) - probnorm(-eta);
      else if (clarity=3) then 
         z = probnorm(i1+i2-eta) - probnorm(i1-eta);
      else z = 1 - probnorm(i1+i2-eta);
      if (z > 1e-8) then ll = log(z);
      else ll = -1e100;
      model clarity ~ general(ll);
      random u ~ normal(0,sd*sd) subject=sub;
      replicate freq;
      estimate 'thresh2' i1;
      estimate 'thresh3' i1 + i2;
      estimate 'icc' sd*sd/(1+sd*sd);
   run;

The PROC statement specifies the input data set and requests correlations both for the parameter estimates (CORR option) and the additional estimates specified with ESTIMATE statements (ECORR option).

The parameters as defined in the PARMS statement are as follows. B0 (overall intercept), B1 (group main effect), B2 (time main effect), B3 (group by time interaction), SD (standard deviation of the random effect), I1 (increment between first and second thresholds), and I2 (increment between second and third thresholds). The BOUNDS statement restricts I1 and I2 to be positive.

The SAS programming statements begin by defining the linear predictor ETA, which is a linear combination of the B parameters and a single random effect U. The next statements define the ordinal likelihood according to the CLARITY variable, ETA, and the increment variables. An error trap is included in case the likelihood becomes too small.

A general log likelihood specification is used in the MODEL statement, and the RANDOM statement defines the random effect U to have standard deviation SD and subject variable SUB. The REPLICATE statement indicates that data for each subject should be replicated according to the FREQ variable.

The ESTIMATE statements specify the second and third thresholds in terms of the increment variables (the first threshold is assumed to equal zero for model identifiability). Also computed is the intraclass correlation.

The output is as follows.

The NLMIXED Procedure

Specifications
Data Set	WORK.INHALER
Dependent Variable	clarity
Distribution for Dependent Variable	General
Random Effects	u
Distribution for Random Effects	Normal
Subject Variable	sub
Replicate Variable	freq
Optimization Technique	Dual Quasi-Newton
Integration Method	Adaptive Gaussian Quadrature

The "Specifications" table echoes some primary information specified for this nonlinear mixed model.

The NLMIXED Procedure

Dimensions
Observations Used	38
Observations Not Used	0
Total Observations	38
Subjects	286
Max Obs Per Subject	2
Parameters	7
Quadrature Points	5

The "Dimensions" table reveals a total of 286 subjects, which is the sum of the values of the FREQ variable. Five quadrature points are selected for log likelihood evaluation.

The NLMIXED Procedure

Parameters
b0	b1	b2	b3	sd	i1	i2	NegLogLike
0	0	0	0	1	1	1	538.484276

The "Parameters" table lists the simple starting values for this problem.

The NLMIXED Procedure

Iteration History
Iter	Calls	NegLogLike	Diff	MaxGrad	Slope
1	2	476.382511	62.10176	43.75062	-1431.4
2	4	463.228197	13.15431	14.24648	-106.753
3	5	458.528118	4.70008	48.31316	-33.0389
4	6	450.975735	7.552383	22.60098	-40.9954
5	8	448.012701	2.963033	14.86877	-16.7453
6	10	447.245153	0.767549	7.774189	-2.26743
7	11	446.72767	0.517483	3.793533	-1.59278
8	13	446.518273	0.209396	0.868638	-0.37801
9	16	446.514528	0.003745	0.328568	-0.02356
10	18	446.513341	0.001187	0.056778	-0.00183
11	20	446.513314	0.000027	0.010785	-0.00004
12	22	446.51331	3.956E-6	0.004922	-5.41E-6
13	24	446.51331	1.989E-7	0.00047	-4E-7

NOTE: GCONV convergence criterion satisfied.

The "Iterations" table indicates successful convergence in 13 iterations.

The NLMIXED Procedure

Fit Statistics
-2 Log Likelihood	893.0
AIC (smaller is better)	907.0
BIC (smaller is better)	932.6
Log Likelihood	-446.5
AIC (larger is better)	-453.5
BIC (larger is better)	-466.3

The "Fitting Information" table lists the log likelihood and information criteria.

The NLMIXED Procedure

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper	Gradient
b0	-0.6364	0.1342	285	-4.74	<.0001	0.05	-0.9006	-0.3722	0.00047
b1	0.6007	0.1770	285	3.39	0.0008	0.05	0.2523	0.9491	0.000265
b2	0.6015	0.1582	285	3.80	0.0002	0.05	0.2900	0.9129	0.00008
b3	-1.4817	0.2385	285	-6.21	<.0001	0.05	-1.9512	-1.0122	0.000102
sd	0.6599	0.1312	285	5.03	<.0001	0.05	0.4017	0.9181	-0.00009
i1	1.7450	0.1474	285	11.84	<.0001	0.05	1.4548	2.0352	0.000202
i2	0.5985	0.1427	285	4.19	<.0001	0.05	0.3177	0.8794	0.000087

The "Parameter Estimates" table indicates significance of all of the parameters.

The NLMIXED Procedure

Additional Estimates
Label	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
thresh2	1.7450	0.1474	285	11.84	<.0001	0.05	1.4548	2.0352
thresh3	2.3435	0.2073	285	11.31	<.0001	0.05	1.9355	2.7515
icc	0.3034	0.08402	285	3.61	0.0004	0.05	0.1380	0.4687

The "Additional Estimates" table displays results from the ESTIMATE statements.

Chapter Contents
Previous
Next
Top