Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The FASTCLUS Procedure

Output Data Sets

OUT= Data Set

The OUT= data set contains

If you specify the IMPUTE option, the OUT= data set also contains a new variable, _IMPUTE_, giving the number of imputed values in each observation.

OUTSEED= Data Set

The OUTSEED= data set contains one observation for each cluster. The variables are as follows:

If you specify the LEAST=p option with a value other than 2, the _RMSSTD_ variable is replaced by the _SCALE_ variable, which contains the pooled scale estimate analogous to the root mean square standard deviation but based on pth power deviations instead of squared deviations:

LEAST=1
mean absolute deviation

LEAST=p
root mean pth-power absolute deviation

LEAST=MAX
maximum absolute deviation

If you specify the OUTITER option, there is one set of observations in the OUTSEED= data set for each pass through the data set (that is, one set for initial seeds, one for each iteration, and one for the final clusters). Also, several additional variables appear:

_ITER_
is the iteration number. For the initial seeds, the value is 0. For the final cluster means or centers, the _ITER_ variable is one greater than the last iteration reported in the iteration history.

_CRIT_
is the clustering criterion as described under the LEAST= option.

_CHANGE_
is the maximum over clusters of the relative change in the cluster seed from the previous iteration. The relative change in a cluster seed is the distance between the old seed and the new seed divided by a scaling factor. If you do not specify the LEAST= option, the scaling factor is the minimum distance between the initial seeds. If you specify the LEAST= option, the scaling factor is an L1 scale estimate and is recomputed on each iteration.
_HOMPAR_
is the value of the homotopy parameter. This variable appears only for LEAST=p with 1<p<2.

_BINSIZ_
is the maximum bin size used for estimating medians. This variable appears only for LEAST=1.

If you specify the OUTITER option, the variables _SCALE_ or _RMSSTD_, _RADIUS_, _NEAR_, and _GAP_ have missing values except for the last pass.

You can use the OUTSEED= data set as a SEED= input data set for a subsequent analysis.

OUTSTAT= Data Set

The variables in the OUTSTAT= data set are as follows:

The values of _TYPE_ for all LEAST= options are given in the following table.

Table 27.2: _TYPE_ Values for all LEAST= Options
     
_TYPE_ Contents of VAR variables Contents of OVER_ALL
INITIALInitial seedsMissing
   
CRITERIONMissingOptimization criterion; see the LEAST= option; this value is displayed just before the "Cluster Summary" table
   
CENTERCluster centers; see the LEAST= optionMissing
   
SEEDCluster seeds: additional information used for imputation 
   
DISPERSIONDispersion estimates for each cluster; see the LEAST= option; these values are displayed in a separate row with title depending on the LEAST= optionDispersion estimates pooled over variables; see the LEAST= option; these values are displayed in the "Cluster Summary" table with label depending on the LEAST= option
   
FREQFrequency of each cluster omitting observations with missing values for the VAR variable; these values are not displayedFrequency of each cluster based on all observations with any nonmissing value; these values are displayed in the "Cluster Summary" table
   
WEIGHTSum of weights for each cluster omitting observations with missing values for the VAR variable; these values are not displayedSum of weights for each cluster based on all observations with any nonmissing value; these values are displayed in the "Cluster Summary" table
   

Observations with _TYPE_='WEIGHT' are included only if you specify the WEIGHT statement.

The _TYPE_ values included only for least-squares clustering are given in the following table. Least-squares clustering is obtained by omitting the LEAST= option or by specifying LEAST=2.

Table 27.3: _TYPE_ Values for Least-Squares Clustering
     
_TYPE_ Contents of VAR variables Contents of OVER_ALL
MEANMean for the total sample; this is not displayedMissing
   
STDStandard deviation for the total sample; this is labeled "Total STD" in the outputStandard deviation pooled over all the VAR variables; this is labeled "Total STD" in the output
   
WITHIN_STDPooled within-cluster standard deviationWithin cluster standard deviation pooled over clusters and all the VAR variables
   
RSQR2 for predicting the variable from the clusters; this is labeled "R-Squared" in the outputR2 pooled over all the VAR variables; this is labeled "R-Squared" in the output
   
RSQ_RATIO[(R2)/(1-R2)]; this is labeled "RSQ/(1-RSQ)" in the output[(R2)/(1-R2)]; labeled "RSQ/(1-RSQ)" in the output
   
PSEUDO_FMissingPseudo F statistic
   
ESRQMissingApproximate expected value of R2 under the null hypothesis of a single uniform cluster
   
CCCMissingThe cubic clustering criterion
   

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.