Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The TRANSREG Procedure

OUTPUT Statement

OUTPUT OUT=SAS-data-set < o-options > ;
The OUTPUT statement creates a new SAS data set that contains coefficients, marginal means, and information on original and transformed variables. The information on original and transformed variables composes the score partition of the data set; observations have _TYPE_='SCORE'. The coefficients and marginal means compose the coefficient partition of the data set; observations have _TYPE_='M COEFFI' or _TYPE_='MEAN'. Other values of _TYPE_ are possible; for details, see "_TYPE_ and _NAME_ Variables" later in this chapter. For details on data set structure, see the "Output Data Set" section.

To specify the data set, use the OUT= specification.

OUT=SAS-data-set
specifies the output data set for the data, transformed data, predicted values, residuals, scores, coefficients, and so on. When you use an OUTPUT statement but do not use the OUT= specification, PROC TRANSREG creates a data set and uses the DATAn convention. If you want to create a permanent SAS data set, you must specify a two-level name (refer to "SAS Files" in SAS Language Reference: Concepts and "Introduction to DATA Step Processing" in the SAS Procedures Guide for details).

To control the contents of the data set and variable names, use one or more of the o-options. You can also specify these options in the PROC TRANSREG statement.

Output Options (o-options)

The following table provides a summary of options in the OUTPUT statement. These options include the OUT= option and all of the o-options.



Table 65.4: Options Available in the OUTPUT Statement
Task Option
Identify output data set 
output data setOUT=
Predicted values, residuals, scores 
outputs canonical scoresCANONICAL
outputs individual confidence limitsCLI
outputs mean confidence limitsCLM
specifies design matrix codingDESIGN=
outputs leverageLEVERAGE
does not restore missingsNORESTOREMISSING
suppresses output of scoresNOSCORES
outputs predicted valuesPREDICTED
outputs redundancy variablesREDUNDANCY=
outputs residualsRESIDUALS
Output data set replacement 
replaces dependent variablesDREPLACE
replaces independent variablesIREPLACE
replaces all variablesREPLACE
Output data set coefficients 
outputs coefficientsCOEFFICIENTS
outputs ideal point coordinatesCOORDINATES
outputs marginal meansMEANS
outputs redundancy analysis coefficientsMREDUNDANCY
Output data set variable name prefixes 
dependent variable approximationsADPREFIX=
independent variable approximationsAIPREFIX=
canonical dependent variablesCDPREFIX=
conservative individual lower CLCILPREFIX=
canonical independent variablesCIPREFIX=
conservative-individual-upper CLCIUPREFIX=
conservative-mean-lower CLCMLPREFIX=
conservative-mean-upper CLCMUPREFIX=
METHOD=MORALS untransformed dependentDEPENDENT=
liberal-individual-lower CLLILPREFIX=
liberal-individual-upper CLLIUPREFIX=
liberal-mean-lower CLLMLPREFIX=
liberal-mean-upper CLLMUPREFIX=
residualsRDPREFIX=
predicted valuesPPREFIX=
redundancy variablesRPREFIX=
transformed dependentsTDPREFIX=
transformed independentsTIPREFIX=
Output data set macros 
creates macro variablesMACRO
Control CLASS variables 
controls output of reference levelsREFERENCE=
Output data set details 
dependent and independent approximationsAPPROXIMATIONS
canonical correlation coefficientsCCC
canonical elliptical point coordinateCEC
canonical point coordinatesCPC
canonical quadratic point coordinatesCQC
approximations to transformed dependentsDAPPROXIMATIONS
approximations to transformed independentsIAPPROXIMATIONS
elliptical point coordinatesMEC
point coordinatesMPC
quadratic point coordinatesMQC
multiple regression coefficientsMRC


For the coefficients partition, the COEFFICIENTS, COORDINATES, and MEANS o-options provide the coefficients that are appropriate for your model. For more explicit control of the coefficient partition, use the options that control details and prefixes.

The following list provides details on these options.

ADPREFIX=name
ADP=name
specifies a prefix for naming the dependent variable predicted values. The default is ADPREFIX=P when you specify the PREDICTED o-option; otherwise, it is ADPREFIX=A. Specifying the ADPREFIX= o-option also implies the PREDICTED o-option, and the ADPREFIX= o-option is the same as the PPREFIX= o-option.

AIPREFIX=name
AIP=name
specifies a prefix for naming the independent variable approximations. The default is AIPREFIX=A. Specifying the AIPREFIX= o-option also implies the IAPPROXIMATIONS o-option.

APPROXIMATIONS
APPROX
APP
is equivalent to specifying both the DAPPROXIMATIONS and the IAPPROXIMATIONS o-options. If METHOD=UNIVARIATE, then the APPROXIMATIONS o-option implies only the DAPPROXIMATIONS o-option.

CANONICAL
CAN
outputs canonical variables to the OUT= data set. When METHOD=CANALS, the CANONICAL o-option is implied. The CDPREFIX= o-option specifies a prefix for naming the dependent canonical variables (default Cand), and the CIPREFIX= o-option specifies a prefix for naming the independent canonical variables (default Cani).

CCC
outputs canonical correlation coefficients to the OUT= data set.

CDPREFIX=name
CDP=name
provides a prefix for naming the canonical dependent variables. The default is CDPREFIX=Cand. Specifying the CDPREFIX= o-option also implies the CANONICAL o-option.

CEC
outputs canonical elliptical point model coordinates to the OUT= data set.

CILPREFIX=name
CIL=name
specifies a prefix for naming the conservative-individual-lower confidence limits. The default prefix is CIL. Specifying the CILPREFIX= o-option also implies the CLI o-option.

CIPREFIX=name
CIP=name
provides a prefix for naming the canonical independent variables. The default is CIPREFIX=Cani. Specifying the CIPREFIX= o-option also implies the CANONICAL o-option.

CIUPREFIX=name
CIU=name
specifies a prefix for naming the conservative-individual-upper confidence limits. The default prefix is CIU. Specifying the CIUPREFIX= o-option also implies the CLI o-option.

CLI
outputs individual confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LILPREFIX= (default LIL for liberal individual lower), CILPREFIX= (default CIL for conservative individual lower), LIUPREFIX= (default LIU for liberal individual upper), and CIUPREFIX= (default CIU for conservative individual upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CLM
outputs mean confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LMLPREFIX= (default LML for liberal mean lower), CMLPREFIX= (default CML for conservative mean lower), LMUPREFIX= (default LMU for liberal mean upper), and CMUPREFIX= (default CMU for conservative mean upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CMLPREFIX=name
CML=name
specifies a prefix for naming the conservative-mean-lower confidence limits. The default prefix is CML. Specifying the CMLPREFIX= o-option also implies the CLM o-option.

CMUPREFIX=name
CMU=name
specifies a prefix for naming the conservative-mean-upper confidence limits. The default prefix is CMU. Specifying the CMUPREFIX= o-option also implies the CLM o-option.

COEFFICIENTS
COE
outputs either multiple regression coefficients or raw canonical coefficients to the OUT= data set. If you specify METHOD=CANALS (in the MODEL or PROC TRANSREG statement), then the COEFFICIENTS o-option outputs the first n canonical variables, where n is the value of the NCAN= a-option (specified in the MODEL or PROC TRANSREG statement). Otherwise, the COEFFICIENTS o-option includes multiple regression coefficients in the OUT= data set. In addition, when you specify the CLASS expansion for any independent variable, the COEFFICIENTS o-option also outputs marginal means.

COORDINATES
COO
outputs either ideal point or vector model coordinates for preference mapping to the OUT= data set. When METHOD=CANALS, these coordinates are computed from canonical coefficients; otherwise, the coordinates are computed from multiple regression coefficients. For details, see the "Point Models" section.

CPC
outputs canonical point model coordinates to the OUT= data set.

CQC
outputs canonical quadratic point model coordinates to the OUT= data set.

DAPPROXIMATIONS
DAP
outputs the approximations of the transformed dependent variables to the OUT= data set. These are the target values for the optimal transformations. With METHOD=UNIVARIATE and METHOD=MORALS, the dependent variable approximations are the ordinary predicted values from the linear model. The names of the approximation variables are constructed from the ADPREFIX= o-option (default A) and the original dependent variable names. For ordinary predicted values, use the PREDICTED o-option instead of the DAPPROXIMATIONS o-option, since the PREDICTED o-option uses a more relevant prefix ("P" instead of "A") and a more relevant variable label suffix ("Predicted Values" instead of "Approximations").

DESIGN<=n>
DES<=n>
specifies that your primary goal is design matrix coding, not analysis. Specifying the DESIGN o-option makes the procedure run faster. The DESIGN o-option sets the default method to UNIVARIATE and the default MAXITER= value to zero. It suppresses computing the regression coefficients, unless they are needed for some other option. Furthermore, when the DESIGN o-option is specified, the MODEL statement is not required to have an equal sign. When no MODEL statement equal sign is specified, all variables are considered independent variables, all options that require dependent variables are ignored, and the IREPLACE o-option is implied.

You can use DESIGN=n for coding very large data sets, where n is the number of observations to code at one time. For example, to code a data set with a large number of observations, you can specify DESIGN=100 or DESIGN=1000 to process the data set in blocks of 100 or 1000 observations. If you specify the DESIGN o-option rather than DESIGN=n, PROC TRANSREG tries to process all observations at once, which will not work with very large data sets. Specify the NOZEROCONSTANT a-option with DESIGN=n to ensure that constant variables within blocks are not zeroed. See the section "Using the DESIGN Output Option" and the section "Choice Experiments: DESIGN, NORESTOREMISSING, NOZEROCONSTANT Usage".

DEPENDENT=name
DEP=name
specifies the untransformed dependent variable for OUT= data sets with METHOD=MORALS when there is more than one dependent variable. The default is DEPENDENT=_DEPEND_.

DREPLACE
DRE
replaces the original dependent variables with the transformed dependent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original dependent variables in the input data set. By default, both the original dependent variables and transformed dependent variables (with names constructed from the TDPREFIX= (default T) o-option and the original dependent variable names) are included in the OUT= data set.

IAPPROXIMATIONS
IAP
outputs the approximations of the transformed independent variables to the OUT= data set. These are the target values for the optimal transformations. The names of the approximation variables are constructed from the AIPREFIX= o-option (default A) and the original independent variable names. Specifying the AIPREFIX= o-option also implies the IAPPROXIMATIONS o-option. The IAPPROXIMATIONS o-option is not valid when METHOD=UNIVARIATE.

IREPLACE
IRE
replaces the original independent variables with the transformed independent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original independent variables in the input data set. By default, both the original independent variables and transformed independent variables (with names constructed from the TIPREFIX= o-option (default T) and the original independent variable names) are included in the OUT= data set.

LEVERAGE<=name>
LEV<=name>
creates a variable with the specified name in the OUT= data set that contains leverages. Specifying the LEVERAGE o-option is equivalent to specifying LEVERAGE=Leverage.

LILPREFIX=name
LIL=name
specifies a prefix for naming the liberal-individual-lower confidence limits. The default prefix is LIL. Specifying the LILPREFIX= o-option also implies the CLI o-option.

LIUPREFIX=name
LIU=name
specifies a prefix for naming the liberal-individual-upper confidence limits. The default prefix is LIU. Specifying the LIUPREFIX= o-option also implies the CLI o-option.

LMLPREFIX=name
LML=name
specifies a prefix for naming the liberal-mean-lower confidence limits. The default prefix is LML. Specifying the LMLPREFIX= o-option also implies the CLM o-option.

LMUPREFIX=name
LMU=name
specifies a prefix for naming the liberal-mean-upper confidence limits. The default prefix is LMU. Specifying the LMUPREFIX= o-option also implies the CLM o-option.

MACRO(keyword=name...)
MAC(keyword=name...)
creates macro variables. Most of the options available within the MACRO o-option are rarely needed. By default, the TRANSREG procedure creates a macro variable named _TRGIND with a complete list of independent variables created by the procedure. When the TRANSREG procedure is being used for design matrix creation prior to running a procedure without a CLASS statement, this macro provides a convenient way to use the results from PROC TRANSREG. For example, a PROC LOGISTIC step that uses a design matrix coded by PROC TRANSREG could use the following MODEL statement:

   model y=&_trgind;


The TRANSREG procedure, also by default, creates a macro variable named _TRGINDN, which contains the number of variables in the _TRGIND list. This macro variable could be used in an ARRAY statement as follows:

   array indvars[&_trgindn] &_trgind;


See the section "Using the DESIGN Output Option" and the section "Choice Experiments: DESIGN, NORESTOREMISSING, NOZEROCONSTANT Usage" for examples of using the default macro variables.

The available keywords are as follows.
DN=name
specifies the name of a macro variable that contains the number of dependent variables. By default, a macro variable named _TRGDEPN is created. This is the number of variables in the DL= list and the number of macro variables created by the DV= and DE= specifications.

IN=name
specifies the name of a macro variable that contains the number of independent variables. By default, a macro variable named _TRGINDN is created. This is the number of variables in the IL= list and the number of macro variables created by the IV= and IE= specifications.

DL=name
specifies the name of a macro variable that contains the list of the dependent variables. By default, a macro variable named _TRGDEP is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three dependent variables, Y1 -Y3, then _TRGDEP contains, by default, TY1 TY2 TY3 (or Y1 Y2 Y3 if you specify the REPLACE o-option).

IL=name
specifies the name of a macro variable that contains the list of the independent variables. By default, a macro variable named _TRGIND is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three independent variables, X1 -X3, then _TRGIND contains, by default, TX1 TX2 TX3 (or X1 X2 X3 if you specify the REPLACE o-option).

DV=prefix
specifies a prefix for creating a list of macro variables, each of which contains one dependent variable name. For example, if there are three dependent variables, Y1 -Y3, and you specify MACRO(DV=DEP), then three macro variables, DEP1, DEP2, and DEP3, are created, containing TY1, TY2, and TY3, respectively (or Y1, Y2, Y3 if you specify the REPLACE o-option). By default, no list is created.

IV=prefix
specifies a prefix for creating a list of macro variables, each of which contains one independent variable name. For example, if there are three independent variables, X1 -X3, and you specify MACRO(IV=IND), then three macro variables, IND1, IND2, and IND3, are created, containing TX1, TX2, and TX3, respectively (or X1, X2, X3 if you specify the REPLACE o-option). By default, no list is created.

DE=prefix
specifies a prefix for creating a list of macro variables, each of which contains one dependent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify MACRO(DE=D), then a macro variable D1 is created for IDENTITY(Y). The D1 macro variable is shown below, wrapped onto two lines.

   4                                TY
   IDENTITY                         Y


The first part is the number of parts (4), the second part is the transformed variable name, the third part is the transformation, and the last part is the input variable name. By default, no list is created.

IE=prefix
specifies a prefix for creating a list of macro variables, each of which contains one independent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify MACRO(ID=I), then three macro variables, I1, I2, and I3, are created for CLASS(X1 | X2) when both X1 and X2 have values of 1 and 2. These macro variables are shown below, but with extra white space removed.

   5     Tx11     CLASS    x1   1
   5     Tx21     CLASS    x2   1
   8     Tx11x21  CLASS    x1   1      CLASS    x2   1


For CLASS variables, the formatted level appears after the variable name. The first two effects are the main effects, and the last is the interaction term. By default, no list is created.

MEANS
MEA
outputs marginal means for CLASS variable expansions to the OUT= data set.

MEC
outputs multiple regression elliptical point model coordinates to the OUT= data set.

MPC
outputs multiple regression point model coordinates to the OUT= data set.

MQC
outputs multiple regression quadratic point model coordinates to the OUT= data set.

MRC
outputs multiple regression coefficients to the OUT= data set.

MREDUNDANCY
MRE
outputs multiple redundancy analysis coefficients to the OUT= data set.

NORESTOREMISSING
NORESTORE
NOR
specifies that missing values should not be restored when the OUT= data set is created. By default, the coded CLASS variable contains a row of missing values for observations in which the CLASS variable is missing. When you specify the NORESTOREMISSING o-option, these observations contain a row of zeros instead. This is useful when the TRANSREG procedure is used to code designs for choice models and there is a constant alternative indicated by a missing value.

NOSCORES
NOS
excludes original variables, transformed variables, predicted values, residuals, and scores from the OUT= data set. You can use the NOSCORES o-option with various other options to create an OUT= data set that contains only a coefficient partition (for example, a data set consisting entirely of coefficients and coordinates).

PREDICTED
PRE
P
outputs predicted values, which for METHOD=UNIVARIATE and METHOD=MORALS are the ordinary predicted values from the linear model, to the OUT= data set. The names of the predicted values' variables are constructed from the PPREFIX= o-option (default P) and the original dependent variable names. Specifying the PPREFIX= o-option also implies the PREDICTED o-option.

PPREFIX=name
PDPREFIX=name
PDP=name
specifies a prefix for naming the dependent variable predicted values. The default is PPREFIX=P when you specify the PREDICTED o-option; otherwise, it is PPREFIX=A. Specifying the PPREFIX= o-option also implies the PREDICTED o-option, and the PPREFIX= o-option is the same as the ADPREFIX= o-option.

RDPREFIX=name
RDP=name
specifies a prefix for naming the residual (dependent) variables to the OUT= data set. The default is RDPREFIX=R. Specifying the RDPREFIX= o-option also implies the RESIDUALS o-option.

REDUNDANCY<=STANDARDIZE | UNSTANDARDIZE>
RED<=STA | UNS>
outputs redundancy variables to the OUT= data set, either standardized or unstandardized. Specifying the REDUNDANCY o-option is the same as specifying REDUNDANCY=STANDARDIZE. The results of the REDUNDANCY o-option depends on the TSTANDARD= option. You must specify TSTANDARD=Z to get results based on standardized data. The TSTANDARD= option controls how the data that go into the redundancy analysis are scaled, and REDUNDANCY=STANDARDIZE|UNSTANDARDIZE controls how the redundancy variables are scaled. The REDUNDANCY o-option is implied by METHOD=REDUNDANCY. The RPREFIX= o-option specifies a prefix (default Red) for naming the redundancy variables.

REFERENCE=NONE | MISSING | ZERO
REF=NON | MIS | ZER
specifies how reference levels of CLASS variables are to be treated. The options are REFERENCE=NONE, the default, in which reference levels are suppressed; REFERENCE=MISSING, in which reference levels are displayed and output with missing values; and REFERENCE=ZERO, in which reference levels are displayed and output with zeros. The REFERENCE= option can be specified in the PROC TRANSREG, MODEL, or OUTPUT statement, and it can be independently specified for the OUT= data set and the displayed output. When you specify it in only one statement, it sets the option for both the displayed output and the OUT= data set.

REPLACE
REP
is equivalent to specifying both the DREPLACE and the IREPLACE o-options.

RESIDUALS
RES
R
outputs the differences between the transformed dependent variables and their predicted values. The names of the residual variables are constructed from the RDPREFIX= o-option (default R) and the original dependent variable names.

RPREFIX=name
RPR=name
provides a prefix for naming the redundancy variables. The default is RPREFIX=Red. Specifying the RPREFIX= o-option also implies the REDUNDANCY o-option.

TDPREFIX=name
TDP=name
specifies a prefix for naming the transformed dependent variables. By default, TDPREFIX=T. The TDPREFIX= o-option is ignored when you specify the DREPLACE o-option.

TIPREFIX=name
TIP=name
specifies a prefix for naming the transformed independent variables. By default, TIPREFIX=T. The TIPREFIX= o-option is ignored when you specify the IREPLACE o-option.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.