Input Data Sets
Data to be analyzed by PROC CATMOD must be
in a SAS data set containing one of the following:
- raw data values (variable values for every subject)
- frequency counts and the corresponding variable values
- response function values and their covariance matrix
If you specify a WEIGHT statement, then PROC CATMOD uses the
values of the WEIGHT variable as the frequency counts.
If the READ function is specified in the RESPONSE statement,
then the procedure expects the input data set to contain the values
of response functions and their covariance matrix.
Otherwise, PROC CATMOD assumes that the SAS
data set contains raw data values.
Raw Data Values
If you use raw data, PROC CATMOD first counts the number of
observations having each combination of values for all
variables specified in the MODEL or POPULATION statements.
For example, suppose the variables A and B
each take on the values 1 and 2, and their
frequencies can be represented as follows.
The SAS data set Raw containing the raw data might be as follows.
Observation
|
A
|
B
|
1 | 1 | 1 |
2 | 1 | 1 |
3 | 1 | 2 |
4 | 1 | 2 |
5 | 1 | 2 |
6 | 2 | 1 |
7 | 2 | 2 |
And the statements for PROC CATMOD would be
proc catmod data=Raw;
model A=B;
run;
For discussions of how to handle structural and random zeros
with raw data as input data, see the "Zero Frequencies" section and
Example 22.5.
Frequency Counts
If your data set contains frequency counts,
then use the WEIGHT statement in PROC CATMOD to
specify the variable containing the frequencies.
For example, you could create the Summary data set as follows.
data Summary;
input A B Count;
datalines;
1 1 2
1 2 3
2 1 1
2 2 1
;
In this case, the corresponding statements would be
proc catmod data=Summary;
weight Count;
model A=B;
run;
The data set Summary can also be created from data set
Raw by using the FREQ procedure:
proc freq data=Raw;
tables A*B / out=Summary;
run;
If you want to read in the response functions and their
covariance matrix, rather than have PROC CATMOD compute
them, create a TYPE=EST data set. In addition to having one
variable name for each function, the data set should have
two additional variables: _TYPE_ and _NAME_,
both character variables of length 8. The variable
_TYPE_ should have the value 'PARMS' when the observation
contains the response functions; it should have the value
'COV' when the observation contains elements of the
covariance matrix of the response functions. The variable
_NAME_ is used only when _TYPE_=COV, in
which case it should contain the name of the variable that
has its covariance elements stored in that observation. In
the following data set, for example, the covariance between
the second and fourth response functions is 0.000102.
data direct(type=est);
input b1-b4 _type_ $ _name_ $8.;
datalines;
0.590463 0.384720 0.273269 0.136458 PARMS .
0.001690 0.000911 0.000474 0.000432 COV B1
0.000911 0.001823 0.000031 0.000102 COV B2
0.000474 0.000031 0.001056 0.000477 COV B3
0.000432 0.000102 0.000477 0.000396 COV B4
;
In order to tell PROC CATMOD that the input data set contains
the values of response functions and their covariance matrix,
- specify the READ function in the RESPONSE statement
- specify _F_ as the dependent variable
in the MODEL statement
For example,
suppose the response functions correspond to four
populations that represent the cross-classification
of two age groups by two race groups.
You can use the FACTORS statement to identify
these two factors and to name the effects in the model.
The statements required to fit a
main-effects model to these data are
proc catmod data=direct;
response read b1-b4;
model _f_=_response_;
factors age 2, race 2 / _response_=age race;
run;
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.