Output Data Sets
OUT= Data Set
The OUT= data set contains all the variables in the original data
set plus new variables containing the principal component scores.
The N= option determines the number of new variables.
The names of the new variables are formed by concatenating
the value given by the PREFIX= option (or Prin if
PREFIX= is omitted) and the numbers 1, 2, 3, and so on.
The new variables have mean 0 and variance equal to the
corresponding eigenvalue, unless you specify the STANDARD option
to standardize the scores to unit variance.
If you specify the COV option, the procedure calculates scores using
the centered variables rather than the standardized variables.
If you use a PARTIAL statement, the OUT= data
set also contains the residuals from predicting
the VAR variables from the PARTIAL variables.
The names of the residual variables are formed by
prefixing R_ to the names of the VAR variables.
An OUT= data set cannot be created if the DATA= data
set is TYPE=ACE, TYPE=CORR, TYPE=COV, TYPE=EST, TYPE=FACTOR,
TYPE=SSCP, TYPE=UCORR, or TYPE=UCOV.
OUTSTAT= Data Set
The OUTSTAT= data set is similar to the TYPE=CORR
data set produced by the CORR procedure.
The following table relates the TYPE= value for the OUTSTAT=
data set to the options specified in the PROC PRINCOMP statement.
Options
|
|
TYPE=
|
(default) | | CORR |
COV | | COV |
NOINT | | UCORR |
COV NOINT | | UCOV |
Notice that the default (neither the COV nor
NOINT option) produces a TYPE=CORR data set.
The new data set contains the following variables:
- the BY variables, if any
- two new variables, _TYPE_ and _NAME_,
both character variables
- the variables analyzed, that is, those in the VAR statement;
or, if there is no VAR statement, all numeric variables not
listed in any other statement; or, if there is a PARTIAL
statement, the residual variables as described under the
OUT= data set
Each observation in the new data set contains some type
of statistic as indicated by the _TYPE_ variable.
The values of the _TYPE_ variable are as follows:
- _TYPE_
- Contents
- MEAN
- mean of each variable.
If you specify the PARTIAL statement, this observation is omitted.
- STD
- standard deviations.
If you specify the COV option, this observation
is omitted, so the SCORE procedure does not
standardize the variables before computing scores.
If you use the PARTIAL statement, the standard
deviation of a variable is computed as its root mean
squared error as predicted from the PARTIAL variables.
- USTD
- uncorrected standard deviations.
When you specify the NOINT option in the PROC
PRINCOMP statement, the OUTSTAT= data set contains
standard deviations not corrected for the mean.
However, if you also specify the COV option in the
PROC PRINCOMP statement, this observation is omitted.
- N
- number of observations on which the analysis is based.
This value is the same for each variable.
If you specify the PARTIAL statement and the
value of the VARDEF= option is DF
or unspecified, then the number of observations is decremented
by the degrees of freedom for the PARTIAL variables.
- SUMWGT
- the sum of the weights of the observations.
This value is the same for each variable.
If you specify the PARTIAL statement and VARDEF=WDF,
then the sum of the weights is decremented by the
degrees of freedom for the PARTIAL variables.
This observation is output only if the value is
different from that in the observation with _TYPE_=`N'.
- CORR
-
correlations between each variable and the
variable specified by the _NAME_ variable.
The number of observations with _TYPE_=`CORR' is
equal to the number of variables being analyzed.
If you specify the COV option, no
_TYPE_=`CORR' observations are produced.
If you use the PARTIAL statement, the partial
correlations, not the raw correlations, are output.
- UCORR
- uncorrected correlation matrix.
When you specify the NOINT option without the COV option in
the PROC PRINCOMP statement, the OUTSTAT= data set contains
a matrix of correlations not corrected for the means.
However, if you also specify the COV option in the
PROC PRINCOMP statement, this observation is omitted.
- COV
-
covariances between each variable and the
variable specified by the _NAME_ variable.
_TYPE_=`COV' observations are produced
only if you specify the COV option.
If you use the PARTIAL statement, the partial
covariances, not the raw covariances, are output.
- UCOV
- uncorrected covariance matrix.
When you specify the NOINT and COV options in the PROC
PRINCOMP statement, the OUTSTAT= data set contains
a matrix of covariances not corrected for the means.
- EIGENVAL
-
eigenvalues.
If the N= option requested fewer than the maximum number of
principal components, only the specified number of eigenvalues
are produced, with missing values filling out the observation.
- SCORE
-
eigenvectors.
The _NAME_ variable contains
the name of the corresponding principal
component as constructed from the PREFIX= option.
The number of observations with _TYPE_=`SCORE'
equals the number of principal components computed.
The eigenvectors have unit length unless you specify the STD option,
in which case the unit-length eigenvectors
are divided by the square roots of the eigenvalues
to produce scores with unit standard deviations.
- USCORE
- scoring coefficients to be applied without
subtracting the mean from the raw variables.
_TYPE_=`USCORE' observations are produced when you
specify the NOINT option in the PROC PRINCOMP statement.
- RSQUARED
- R-squares for each VAR variable as predicted by the PARTIAL variables
- B
- regression coefficients for each VAR variable as predicted by the PARTIAL
variables. This observation is produced only if you specify the COV option.
- STB
- standardized regression coefficients for each VAR variable as
predicted by the PARTIAL variables.
If you specify the COV option, this observation is omitted.
The data set can be used with the SCORE procedure to compute
principal component scores, or it can be used as input to the
FACTOR procedure specifying METHOD=SCORE to rotate the components.
If you use the PARTIAL statement, the scoring coefficients
should be applied to the residuals, not the original variables.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.