Displayed Output

The FASTCLUS Procedure

Displayed Output

Unless the SHORT or SUMMARY option is specified, PROC FASTCLUS displays

Initial Seeds, cluster seeds selected after one pass through the data
Change in Cluster Seeds for each iteration, if you specify MAXITER=n>1

If you specify the LEAST=p option, with (1 < p < 2), and you omit the IRLS option, an additional column is displayed in the Iteration History table. The column contains a character to identify the method used in each iteration. PROC FASTCLUS chooses the most efficient method to cluster the data at each iterative step, given the condition of the data. Thus, the method chosen is data dependent. The possible values are described as follows:

Value		Method
N		Newton's Method
I or L		iteratively weighted least squares (IRLS)
1		IRLS step, halved once
2		IRLS step, halved twice
3		IRLS step, halved three times

PROC FASTCLUS displays a Cluster Summary, giving the following for each cluster:

Cluster number
Frequency, the number of observations in the cluster
Weight, the sum of the weights of the observations in the cluster, if you specify the WEIGHT statement
RMS Std Deviation, the root mean square across variables of the cluster standard deviations, which is equal to the root mean square distance between observations in the cluster
Maximum Distance from Seed to Observation, the maximum distance from the cluster seed to any observation in the cluster
Nearest Cluster, the number of the cluster with mean closest to the mean of the current cluster
Centroid Distance, the distance between the centroids (means) of the current cluster and the nearest other cluster

A table of statistics for each variable is displayed unless you specify the SUMMARY option. The table contains

Total STD, the total standard deviation
Within STD, the pooled within-cluster standard deviation
R-Squared, the R² for predicting the variable from the cluster
RSQ/(1 - RSQ), the ratio of between-cluster variance to within-cluster variance (R²/(1 - R²))
OVER-ALL, all of the previous quantities pooled across variables

PROC FASTCLUS also displays

Pseudo F Statistic,
[( [(R²)/(c - 1)] )/( [(1 - R²)/(n - c)] )]
where R² is the observed overall R², c is the number of clusters, and n is the number of observations. The pseudo F statistic was suggested by Calinski and Harabasz (1974). Refer to Milligan and Cooper (1985) and Cooper and Milligan (1988) regarding the use of the pseudo F statistic in estimating the number of clusters. See Example 23.2 in Chapter 23, "The CLUSTER Procedure," for a comparison of pseudo F statistics.
Observed Overall R-Squared, if you specify the SUMMARY option
Approximate Expected Overall R-Squared, the approximate expected value of the overall R² under the uniform null hypothesis assuming that the variables are uncorrelated. The value is missing if the number of clusters is greater than one-fifth the number of observations.
Cubic Clustering Criterion, computed under the assumption that the variables are uncorrelated. The value is missing if the number of clusters is greater than one-fifth the number of observations.
If you are interested in the approximate expected R² or the cubic clustering criterion but your variables are correlated, you should cluster principal component scores from the PRINCOMP procedure. Both of these statistics are described by Sarle (1983). The performance of the cubic clustering criterion in estimating the number of clusters is examined by Milligan and Cooper (1985) and Cooper and Milligan (1988).
Distances Between Cluster Means, if you specify the DISTANCE option

Unless you specify the SHORT or SUMMARY option, PROC FASTCLUS displays

Cluster Means for each variable
Cluster Standard Deviations for each variable

Chapter Contents
Previous
Next
Top