Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The VARCLUS Procedure

PROC VARCLUS Statement

PROC VARCLUS < options >;
The PROC VARCLUS statement starts the VARCLUS procedure and optionally identifies a data set or requests particular cluster analyses. By default, the procedure uses the most recently created SAS data set and omits observations with missing values from the analysis. Table 68.1 summarizes some of the options available in the PROC VARCLUS statement.



Table 68.1: Options Available in the PROC VARCLUS Statement
Task Options
Specify data setsDATA= OUTSTAT= OUTTREE=
Determine the number of clustersMAXCLUSTERS= MINCLUSTERS= MAXEIGEN= PROPORTION=
Specify cluster formationCENTROID COVARIANCE HIERARCHY INTITIAL= MAXITER= MAXSEARCH= MULTIPLEGROUP RANDOM=
Control outputCORR NOPRINT SHORT SIMPLE SUMMARY TRACE
Omit interceptNOINT
Specify divisor for variancesVARDEF=


The following list gives details on these options. The list is in alphabetical order.

CENTROID
uses centroid components rather than principal components. You should specify centroid components if you want the cluster components to be unweighted averages of the standardized variables (the default) or the unstandardized variables (if you specify the COV option). It is possible to obtain locally optimal clusterings in which a variable is not assigned to the cluster component with which it has the highest squared correlation. You cannot specify the CENTROID option with the MAXEIGEN= option.

CORR
C
displays the correlation matrix.

COVARIANCE
COV
analyzes the covariance matrix rather than the correlation matrix.

DATA=SAS-data-set
specifies the input data set to be analyzed. The data set can be an ordinary SAS data set or TYPE=CORR, UCORR, COV, UCOV, FACTOR, or SSCP. If you do not specify the DATA= option, the most recently created SAS data set is used. See Appendix A, "Special SAS Data Sets," for more information on types of SAS data sets.

HIERARCHY
HI
requires the clusters at different levels to maintain a hierarchical structure.

INITIAL=GROUP
INITIAL=INPUT
INITIAL=RANDOM
INITIAL=SEED
specifies the method for initializing the clusters. If the INITIAL= option is omitted and the MINCLUSTERS= option is greater than 1, the initial cluster components are obtained by extracting the required number of principal components and performing an orthoblique rotation. The following list describes the values for the INITIAL= option:

GROUP
specifies that clusters be initialized by group. You can use this option if the input data set is a TYPE=CORR, UCORR, COV, UCOV, or FACTOR data set. The cluster membership of each variable is obtained from an observation with _TYPE_='GROUP', which contains an integer for each variable ranging from one to the number of clusters. You can use a data set created either by a previous run of PROC VARCLUS or in a DATA step.

INPUT
specifies that the input data set is a TYPE=CORR, UCORR, COV, UCOV, or FACTOR data set, in which case scoring coefficients are read from observations where _TYPE_='SCORE'. You can use scoring coefficients from the FACTOR procedure or a previous run of PROC VARCLUS, or you can enter other coefficients in a DATA step.

RANDOM
assigns variables randomly to clusters. If you specify INITIAL=RANDOM without the CENTROID option, it is recommended that you specify MAXSEARCH=5, although the CPU time required is substantially increased.

SEED
initializes clusters according to the variables named in the SEED statement. Each variable listed in the SEED statement becomes the sole member of a cluster, and the other variables remain unassigned. If you do not specify the SEED statement, the first MINCLUSTERS= variables in the VAR statement are used as seeds.

MAXCLUSTERS=n
MAXC=n
specifies the largest number of clusters desired. The default value is the number of variables.

MAXEIGEN=n
specifies the largest permissible value of the second eigenvalue in each cluster. If you do not specify either the PROPORTION= or the MAXCLUSTERS= option, the default value is the average of the diagonal elements of the matrix being analyzed. This value is either the average variance if a covariance matrix is analyzed, or 1 if the correlation matrix is analyzed (unless some of the variables are constant, in which case the value is the number of nonconstant variables divided by the number of variables). Otherwise, the default is 0. The MAXEIGEN= option cannot be used with the CENTROID option.

MAXSEARCH=n
specifies the maximum number of iterations during the search phase. The default is 10 if you specify the CENTROID option; the default is 0 otherwise.

MINCLUSTERS=n
MINC=n
specifies the smallest number of clusters desired. The default value is 2 if INITIAL=RANDOM or INITIAL=SEED; otherwise, the procedure begins with one cluster and tries to split it in accordance with the PROPORTION= or MAXEIGEN= option.

MULTIPLEGROUP
MG
performs a multiple group component analysis (refer to Harman 1976). The input data set must be TYPE=CORR, UCORR, COV, UCOV, FACTOR or SSCP and must contain an observation with _TYPE_='GROUP' defining the variable groups. Specifying the MULTIPLEGROUP option is equivalent to specifying all of the following options: MINC=1, MAXITER=0, MAXSEARCH=0, MAXEIGEN=0, PROPORTION=0, and INITIAL=GROUP.

NOINT
requests that no intercept be used; covariances or correlations are not corrected for the mean. If you specify the NOINT option, the OUTSTAT= data set is TYPE=UCORR.

NOPRINT
suppresses the output. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 15, "Using the Output Delivery System."

OUTSTAT=SAS-data-set
creates an output data set to contain statistics including means, standard deviations, correlations, cluster scoring coefficients, and the cluster structure. If you want to create a permanent SAS data set, you must specify a two-level name. The OUTSTAT= data set is TYPE=UCORR if the NOINT option is specified. For more information on permanent SAS data sets, refer to "SAS Files" and "DATA Step Concepts" in SAS Language Reference: Concepts. For information on types of SAS data sets, see Appendix A.

OUTTREE=SAS-data-set
creates an output data set to contain information on the tree structure that can be used by the TREE procedure to print a tree diagram. The OUTTREE= option implies the HIERARCHY option. See Example 68.1 for use of the OUTTREE= option. If you want to create a permanent SAS data set, you must specify a two-level name. For more information on permanent SAS data sets, refer to "SAS Files" and "DATA Step Concepts" in SAS Language Reference: Concepts.

PROPORTION=n
PERCENT=n
gives the proportion or percentage of variation that must be explained by the cluster component. Values greater than 1.0 are considered to be percentages, so PROPORTION=0.75 and PERCENT=75 are equivalent. If you specify the CENTROID option, the default value is 0.75; otherwise, the default value is 0.

MAXITER=n
specifies the maximum number of iterations during the alternating least-squares phase. The default value is 1 if you specify the CENTROID option; the default is 10 otherwise.

RANDOM=n
specifies a positive integer as a starting value for use with REPLACE=RANDOM. If you do not specify the RANDOM= option, the time of day is used to initialize the pseudo-random number sequence.

SHORT
suppresses printing of the cluster structure, scoring coefficient, and intercluster correlation matrices.

SIMPLE
S
displays means and standard deviations.

SUMMARY
suppresses all default output except the final summary table.

TRACE
lists the cluster to which each variable is assigned during the iterations.

VARDEF=DF
VARDEF=N
VARDEF=WDF
VARDEF=WEIGHT | WGT
specifies the divisor to be used in the calculation of variances and covariances. The default value is VARDEF=DF. The values and associated divisors are displayed in the following table.

Value Divisor Formula
DFdegrees of freedomn-i
Nnumber of observationsn
WDFsum of weights minus one(\sum_j w_j)-1
WEIGHT | WGTsum of weights\sum_j w_j


In the preceding table, i=0 if the NOINT option is specified, and i=1 otherwise.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.