Output Data Sets
When an output data set includes variables containing
the posterior probabilities of group membership (OUT=,
OUTCROSS=, or TESTOUT= data sets) or group-specific density
estimates (OUTD= or TESTOUTD= data sets), the names
of these variables are constructed from the formatted
values of the class levels converted to valid SAS variable names.
The OUT= data set contains all the variables in the DATA=
data set, plus new variables containing the posterior
probabilities and the resubstitution classification results.
The names of the new variables containing the posterior
probabilities are constructed from the formatted
values of the class levels converted to SAS names.
A new variable, _INTO_, with the same attributes as the CLASS
variable, specifies the class to which each observation is assigned.
If an observation is classified into group
OTHER, the variable _INTO_ has a missing value.
When you specify the CANONICAL option, the data set also
contains new variables with canonical variable scores.
The NCAN= option determines the number of canonical variables.
The names of the canonical variables are
constructed as described in the CANPREFIX= option.
The canonical variables have means equal to zero
and pooled within-class variances equal to one.
An OUT= data set cannot be created if the
DATA= data set is not an ordinary SAS data set.
OUTD= Data Set
The OUTD= data set contains all the variables
in the DATA= data set, plus new variables
containing the group-specific density estimates.
The names of the new variables containing the density estimates
are constructed from the formatted values of the class levels.
An OUTD= data set cannot be created if the
DATA= data set is not an ordinary SAS data set.
OUTCROSS= Data Set
The OUTCROSS= data set contains all the variables in the
DATA= data set, plus new variables containing the posterior
probabilities and the classification results of cross validation.
The names of the new variables containing the
posterior probabilities are constructed from
the formatted values of the class levels.
A new variable, _INTO_, with the same attributes as the CLASS
variable, specifies the class to which each observation is assigned.
When an observation is classified into group OTHER,
the variable _INTO_ has a missing value.
When you specify the CANONICAL option, the data set also
contains new variables with canonical variable scores.
The NCAN= option determines the number of new variables.
The names of the new variables are constructed
as described in the CANPREFIX= option.
The new variables have mean zero and
pooled within-class variance equal to one.
An OUTCROSS= data set cannot be created if the
DATA= data set is not an ordinary SAS data set.
TESTOUT= Data Set
The TESTOUT= data set contains all the variables in the
TESTDATA= data set, plus new variables containing the
posterior probabilities and the classification results.
The names of the new variables containing
the posterior probabilities are formed from
the formatted values of the class levels.
A new variable, _INTO_, with the same attributes as the CLASS
variable, gives the class to which each observation is assigned.
If an observation is classified into group
OTHER, the variable _INTO_ has a missing value.
When you specify the CANONICAL option, the data set also
contains new variables with canonical variable scores.
The NCAN= option determines the number of new variables.
The names of the new variables are formed
as described in the CANPREFIX= option.
TESTOUTD= Data Set
The TESTOUTD= data set contains all the variables
in the TESTDATA= data set, plus new variables
containing the group-specific density estimates.
The names of the new variables containing the density estimates
are formed from the formatted values of the class levels.
OUTSTAT= Data Set
The OUTSTAT= data set is similar to the TYPE=CORR
data set produced by the CORR procedure.
The data set contains various statistics such
as means, standard deviations, and correlations.
For an example of an OUTSTAT= data set,
see Example 25.3.
When you specify the CANONICAL option, canonical correlations,
canonical structures, canonical coefficients, and means of
canonical variables for each class are included in the data set.
If you specify METHOD=NORMAL, the output data set also
includes coefficients of the discriminant functions,
and the data set is TYPE=LINEAR (POOL=YES),
TYPE=QUAD (POOL=NO), or TYPE=MIXED (POOL=TEST).
If you specify METHOD=NPAR, this output data set is TYPE=CORR.
The OUTSTAT= data set contains the following variables:
- the BY variables, if any
- the CLASS variable
- _TYPE_, a character variable of length
8 that identifies the type of statistic
- _NAME_, a character variable of length 32 that identifies
the row of the matrix, the name of the canonical variable,
or the type of the discriminant function coefficients
- the quantitative variables, that is, those in the VAR
statement, or, if there is no VAR statement, all
numeric variables not listed in any other statement
The observations, as identified by the variable
_TYPE_, have the following _TYPE_ values:
- _TYPE_
- Contents
- N
- number of observations both for the total sample (CLASS variable
missing) and within each class (CLASS variable present)
- SUMWGT
- sum of weights both for the total sample (CLASS
variable missing) and within each class (CLASS
variable present), if a WEIGHT statement is specified
- MEAN
- means both for the total sample (CLASS variable
missing) and within each class (CLASS variable present)
- PRIOR
- prior probability for each class
- STDMEAN
- total-standardized class means
- PSTDMEAN
- pooled within-class standardized class means
- STD
- standard deviations both for the total sample (CLASS variable
missing) and within each class (CLASS variable present)
- PSTD
- pooled within-class standard deviations
- BSTD
- between-class standard deviations
- RSQUARED
- univariate R2s
- LNDETERM
- the natural log of the determinant or the natural
log of the quasi-determinant of the within-class
covariance matrix either pooled (CLASS variable
missing) or not pooled (CLASS variable present)
The following kinds of observations are identified by the
combination of the variables _TYPE_ and _NAME_.
When the _TYPE_ variable has one of the following values,
the _NAME_ variable identifies the row of the matrix.
- _TYPE_
- Contents
- CSSCP
- corrected SSCP matrix both for the total sample (CLASS variable
missing) and within each class (CLASS variable present)
- PSSCP
- pooled within-class corrected SSCP matrix
- BSSCP
- between-class SSCP matrix
- COV
- covariance matrix both for the total sample (CLASS variable
missing) and within each class (CLASS variable present)
- PCOV
- pooled within-class covariance matrix
- BCOV
- between-class covariance matrix
- CORR
- correlation matrix
both for the total sample (CLASS variable missing)
and within each class (CLASS variable present)
- PCORR
- pooled within-class correlation matrix
- BCORR
- between-class correlation matrix
When you request canonical discriminant analysis, the
_TYPE_ variable can have one of the following values.
The _NAME_ variable identifies a canonical variable.
- _TYPE_
- Contents
- CANCORR
- canonical correlations
- STRUCTUR
- canonical structure
- BSTRUCT
- between canonical structure
- PSTRUCT
- pooled within-class canonical structure
- SCORE
- standardized canonical coefficients
- RAWSCORE
- raw canonical coefficients
- CANMEAN
- means of the canonical variables for each class
When you specify METHOD=NORMAL, the _TYPE_
variable can have one of the following values.
The _NAME_ variable identifies different types
of coefficients in the discriminant function.
- _TYPE_
- Contents
- LINEAR
- coefficients of the linear discriminant functions
- QUAD
- coefficients of the quadratic discriminant functions
The values of the _NAME_ variable are as follows:
- _NAME_
- Contents
- variable names
- quadratic coefficients of the quadratic discriminant
functions (a symmetric matrix for each class)
- _LINEAR_
- linear coefficients of the discriminant functions
- _CONST_
- constant coefficients of the discriminant functions
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.