Example 29.5: GEE for Binary Data with Logit Link Function

The GENMOD Procedure

Example 29.5: GEE for Binary Data with Logit Link Function

Table 29.4 displays a partial listing of a SAS data set of clinical trial data comparing two treatments for a respiratory disorder. See "Gee Model for Binary Data" in the SAS/STAT Sample Program Library for the complete data set. These data are from Stokes, Davis, and Koch (1995), where a SAS macro is used to fit a GEE model. A GEE model is fit, using the REPEATED statement in the GENMOD procedure.

Table 29.4: Respiratory Disorder Data

Obs	center	id	age	baseline	active	center2	female	visit	outcome
1	1	1	46	0	0	0	0	1	0
2	1	1	46	0	0	0	0	2	0
3	1	1	46	0	0	0	0	3	0
4	1	1	46	0	0	0	0	4	0
5	1	2	28	0	0	0	0	1	0
6	1	2	28	0	0	0	0	2	0
7	1	2	28	0	0	0	0	3	0
8	1	2	28	0	0	0	0	4	0

Patients in each of two centers are randomly assigned to groups receiving the active treatment or a placebo. During treatment, respiratory status (coded here as 0=poor, 1=good) is determined for each of four visits. The variables center, treatment, sex, and baseline (baseline respiratory status) are classification variables with two levels. The variable age (age at time of entry into the study) is a continuous variable.

Explanatory variables in the model are Intercept (x_ij1), treatment (x_ij2), center (x_ij3), sex (x_ij4), age (x_ij6), and baseline (x_ij6), so that I>xp = [x_ij1,x_ij2, ... ,x_ij6] is the vector of explanatory variables. Indicator variables for the classification explanatory variables can be automatically generated by listing them in the CLASS statement in PROC GENMOD. However, in order to be consistent with the analysis

in Stokes, Davis, and Koch (1995), the four classification explanatory variables are coded as follows:

$x_{ij2}&=&\{{0 placebo} \ {1 active} .& x_{ij3}&=&\{{0 center 1} \ {1 center 2} . \$

$x_{ij4}&=&\{{0 male} \ {1 female} . & x_{ij6}&=&\{{0 poor} \ {1 good} . \$

Suppose y_ij represents the respiratory status of patient i at the jth visit, j = 1, ... ,4, and $\mu_{ij}={\rm E}(y_{ij})$ represents the mean of the respiratory status. Since the response data are binary, you can use the variance function for the binomial distribution $v(\mu_{ij})=\mu_{ij}(1-\mu_{ij})$ and the logit link function $g(\mu_{ij}) = \log(\mu_{ij}/(1-\mu_{ij}))$ .The model for the mean is $g(\mu_{ij})=\xdpbeta$ , where ${\beta}$ is a vector of regression parameters to be estimated.

Further manipulation of the data set creates an observation for each visit with the respiratory status at each visit represented by the binary variable outcome and indicator variables for treatment (active), center (center2), and sex (female).

   data resp;
      keep id active center center2 female age baseline visit outcome;
      input center id treatmnt $ sex $ age baseline visit1-visit4;
      active=(treatmnt='A');
      center2=(center=2);
      female=(sex='F');
      visit=1;  outcome=visit1;  output;
      visit=2;  outcome=visit2;  output;
      visit=3;  outcome=visit3;  output;
      visit=4;  outcome=visit4;  output;
      datalines;
   1  1 P M 46 0 0 0 0 0
   1  2 P M 28 0 0 0 0 0
   1  3 A M 23 1 1 1 1 1
   1  4 P M 44 1 1 1 1 0
   1  5 P F 13 1 1 1 1 1
   .
   .
   .
   1 52 P M 43 0 0 0 1 0
   1 53 A F 32 0 0 0 1 0
   1 54 A M 11 1 1 1 1 0
   1 55 P M 24 1 1 1 1 1
   1 56 A M 25 0 1 1 0 1
   2  1 P F 39 0 0 0 0 0
   2  2 A M 25 0 0 1 1 1
   2  3 A M 58 1 1 1 1 1
   2  4 P F 51 1 1 0 1 1
   2  5 P F 32 1 0 0 1 1

   .
   .
   .
   2 51 A M 43 1 1 1 1 0
   2 52 A F 39 0 1 1 1 1
   2 53 A M 68 0 1 1 1 1
   2 54 A F 63 1 1 1 1 1
   2 55 A M 31 1 1 1 1 1
   ;

The GEE solution is requested with the REPEATED statement in the GENMOD procedure. The option SUBJECT=ID(CENTER) specifies that the observations in a single cluster are uniquely identified by center and id within center. The option TYPE=UNSTR specifies the unstructured working correlation structure. The MODEL statement specifies the regression model for the mean with the binomial distribution variance function.

   proc genmod data=resp;
      class id center;
      model outcome=center2 active female age baseline / d=bin;
      repeated  subject=id(center) / type=unstr corrw;
   run;

These statements first produce the usual output (not shown) for fitting the generalized linear (GLM) model specified in the MODEL statement. The parameter estimates from the GLM model are used as initial values for the GEE solution.

Information about the GEE model is displayed in Output 29.5.1. The results of GEE model fitting are displayed in Output 29.5.2. If you specify no other options, the standard errors, confidence intervals, Z scores, and p-values are based on empirical standard error estimates. You can specify the MODELSE option in the REPEATED statement to create a table based on model-based standard error estimates.

Output 29.5.1: Model Fitting Information

The GENMOD Procedure

GEE Model Information
Correlation Structure	Unstructured
Subject Effect	id(center) (111 levels)
Number of Clusters	111
Correlation Matrix Dimension	4
Maximum Cluster Size	4
Minimum Cluster Size	4

Output 29.5.2: Results of Model Fitting

The GENMOD Procedure

Working Correlation Matrix
	Col1	Col2	Col3	Col4
Row1	1.0000	0.3351	0.2140	0.2953
Row2	0.3351	1.0000	0.4429	0.3581
Row3	0.2140	0.4429	1.0000	0.3964
Row4	0.2953	0.3581	0.3964	1.0000

Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter	Estimate	Standard Error	95% Confidence Limits		Z	Pr > \|Z\|
Intercept	-0.8882	0.4568	-1.7835	0.0071	-1.94	0.0519
center2	0.6558	0.3512	-0.0326	1.3442	1.87	0.0619
active	1.2442	0.3455	0.5669	1.9214	3.60	0.0003
female	0.1128	0.4408	-0.7512	0.9768	0.26	0.7981
age	-0.0175	0.0129	-0.0427	0.0077	-1.36	0.1728
baseline	1.8981	0.3441	1.2237	2.5725	5.52	<.0001

The non-significance of age and female make them candidates for omission from the model.

Chapter Contents
Previous
Next
Top