Example 23.3: Cluster Analysis of Fisher Iris Data
The iris data published by Fisher (1936) have been widely used
for examples in discriminant analysis and cluster analysis.
The sepal length, sepal width, petal length, and
petal width are measured in millimeters on fifty
iris specimens from each of three species,
Iris setosa, I. versicolor, and I. virginica.
Mezzich and Solomon (1980) discuss a variety
of cluster analyses of the iris data.
This example analyzes the iris data by Ward's method
and two-stage density linkage and then illustrates
how the FASTCLUS procedure can be used in combination
with PROC CLUSTER to analyze large data sets.
title 'Cluster Analysis of Fisher (1936) Iris Data';
proc format;
value specname
1='Setosa '
2='Versicolor'
3='Virginica ';
run;
data iris;
input SepalLength SepalWidth PetalLength PetalWidth Species @@;
format Species specname.;
label SepalLength='Sepal Length in mm.'
SepalWidth ='Sepal Width in mm.'
PetalLength='Petal Length in mm.'
PetalWidth ='Petal Width in mm.';
symbol = put(species, specname10.);
datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;
The following macro, SHOW, is used in the
subsequent analyses to display cluster results.
It invokes the FREQ procedure to
crosstabulate clusters and species.
The CANDISC procedure computes canonical variables for
discriminating among the clusters, and the first two
canonical variables are plotted to show cluster membership.
See Chapter 21, "The CANDISC Procedure," for a
canonical discriminant analysis of the iris species.
%macro show;
proc freq;
tables cluster*species;
run;
proc candisc noprint out=can;
class cluster;
var petal: sepal:;
run;
legend1 frame cframe=ligr cborder=black
position=center value=(justify=center);
axis1 label=(angle=90 rotate=0) minor=none;
axis2 minor=none;
proc gplot;
plot can2*can1=cluster /
frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;
run;
%mend;
The first analysis clusters the iris data by Ward's method
and plots the CCC and pseudo F and t2 statistics.
The CCC has a local peak at 3 clusters
but a higher peak at 5 clusters.
The pseudo F statistic indicates 3 clusters, while
the pseudo t2 statistic suggests 3 or 6 clusters.
For large numbers of clusters, Version 6 of
the SAS System produces somewhat different
results than previous versions of PROC CLUSTER.
This is due to changes in the treatment of ties.
Results are identical for 5 or fewer clusters.
The TREE procedure creates an output data set containing
the 3-cluster partition for use by the SHOW macro.
The FREQ procedure reveals 16 misclassifications.
The results are shown in Output 23.3.1.
title2 'By Ward''s Method';
proc cluster data=iris method=ward print=15 ccc pseudo;
var petal: sepal:;
copy species;
run;
legend1 frame cframe=ligr cborder=black
position=center value=(justify=center);
axis1 label=(angle=90 rotate=0) minor=none order=(0 to 600 by 100);
axis2 minor=none order=(1 to 30 by 1);
axis3 label=(angle=90 rotate=0) minor=none order=(0 to 7 by 1);
proc gplot;
plot _ccc_*_ncl_ /
frame cframe=ligr legend=legend1 vaxis=axis3 haxis=axis2;
plot _psf_*_ncl_ _pst2_*_ncl_ /overlay
frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;
run;
proc tree noprint ncl=3 out=out;
copy petal: sepal: species;
run;
%show;
Output 23.3.1: Cluster Analysis of Fisher Iris Data:
CLUSTER with METHOD=WARD
Cluster Analysis of Fisher (1936) Iris Data |
By Ward's Method |
The CLUSTER Procedure |
Ward's Minimum Variance Cluster Analysis |
Eigenvalues of the Covariance Matrix |
|
Eigenvalue |
Difference |
Proportion |
Cumulative |
1 |
422.824171 |
398.557096 |
0.9246 |
0.9246 |
2 |
24.267075 |
16.446125 |
0.0531 |
0.9777 |
3 |
7.820950 |
5.437441 |
0.0171 |
0.9948 |
4 |
2.383509 |
|
0.0052 |
1.0000 |
Root-Mean-Square Total-Sample Standard Deviation = 10.69224 |
Root-Mean-Square Distance Between Observations = 30.24221 |
Cluster History |
NCL |
Clusters Joined |
FREQ |
SPRSQ |
RSQ |
ERSQ |
CCC |
PSF |
PST2 |
T i e |
15 |
CL24 |
CL28 |
15 |
0.0016 |
.971 |
.958 |
5.93 |
324 |
9.8 |
|
14 |
CL21 |
CL53 |
7 |
0.0019 |
.969 |
.955 |
5.85 |
329 |
5.1 |
|
13 |
CL18 |
CL48 |
15 |
0.0023 |
.967 |
.953 |
5.69 |
334 |
8.9 |
|
12 |
CL16 |
CL23 |
24 |
0.0023 |
.965 |
.950 |
4.63 |
342 |
9.6 |
|
11 |
CL14 |
CL43 |
12 |
0.0025 |
.962 |
.946 |
4.67 |
353 |
5.8 |
|
10 |
CL26 |
CL20 |
22 |
0.0027 |
.959 |
.942 |
4.81 |
368 |
12.9 |
|
9 |
CL27 |
CL17 |
31 |
0.0031 |
.956 |
.936 |
5.02 |
387 |
17.8 |
|
8 |
CL35 |
CL15 |
23 |
0.0031 |
.953 |
.930 |
5.44 |
414 |
13.8 |
|
7 |
CL10 |
CL47 |
26 |
0.0058 |
.947 |
.921 |
5.43 |
430 |
19.1 |
|
6 |
CL8 |
CL13 |
38 |
0.0060 |
.941 |
.911 |
5.81 |
463 |
16.3 |
|
5 |
CL9 |
CL19 |
50 |
0.0105 |
.931 |
.895 |
5.82 |
488 |
43.2 |
|
4 |
CL12 |
CL11 |
36 |
0.0172 |
.914 |
.872 |
3.99 |
515 |
41.0 |
|
3 |
CL6 |
CL7 |
64 |
0.0301 |
.884 |
.827 |
4.33 |
558 |
57.2 |
|
2 |
CL4 |
CL3 |
100 |
0.1110 |
.773 |
.697 |
3.83 |
503 |
116 |
|
1 |
CL5 |
CL2 |
150 |
0.7726 |
.000 |
.000 |
0.00 |
. |
503 |
|
|
Cluster Analysis of Fisher (1936) Iris Data |
Frequency Percent Row Pct Col Pct |
|
Table of CLUSTER by Species |
CLUSTER |
Species |
Total |
Setosa |
Versicolor |
Virginica |
1 |
0 0.00 0.00 0.00 |
49 32.67 76.56 98.00 |
15 10.00 23.44 30.00 |
64 42.67 |
2 |
0 0.00 0.00 0.00 |
1 0.67 2.78 2.00 |
35 23.33 97.22 70.00 |
36 24.00 |
3 |
50 33.33 100.00 100.00 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
50 33.33 |
Total |
50 33.33 |
50 33.33 |
50 33.33 |
150 100.00 |
|
|
The second analysis uses two-stage density linkage.
The raw data suggest 2 or 6 modes instead of 3:
k
|
|
modes
|
3 | | 12 |
4-6 | | 6 |
7 | | 4 |
8 | | 3 |
9-50 | | 2 |
51+ | | 1 |
However, the ACECLUS procedure can be used to reveal 3 modes.
This analysis uses K=8 to produce 3
clusters for comparison with other analyses.
There are only 6 misclassifications.
The results are shown in Output 23.3.2.
title2 'By Two-Stage Density Linkage';
proc cluster data=iris method=twostage k=8 print=15 ccc pseudo;
var petal: sepal:;
copy species;
run;
proc tree noprint ncl=3 out=out;
copy petal: sepal: species;
run;
%show;
Output 23.3.2: Cluster Analysis of Fisher Iris Data:
CLUSTER with METHOD=TWOSTAGE
Cluster Analysis of Fisher (1936) Iris Data |
By Two-Stage Density Linkage |
The CLUSTER Procedure |
Two-Stage Density Linkage Clustering |
Eigenvalues of the Covariance Matrix |
|
Eigenvalue |
Difference |
Proportion |
Cumulative |
1 |
422.824171 |
398.557096 |
0.9246 |
0.9246 |
2 |
24.267075 |
16.446125 |
0.0531 |
0.9777 |
3 |
7.820950 |
5.437441 |
0.0171 |
0.9948 |
4 |
2.383509 |
|
0.0052 |
1.0000 |
Root-Mean-Square Total-Sample Standard Deviation = 10.69224 |
Cluster History |
NCL |
|
FREQ |
SPRSQ |
RSQ |
ERSQ |
CCC |
PSF |
PST2 |
Normalized Fusion Density |
Maximum Density in Each Cluster |
T i e |
Clusters Joined |
Lesser |
Greater |
15 |
CL17 |
OB127 |
44 |
0.0025 |
.916 |
.958 |
-11 |
105 |
3.4 |
0.3903 |
0.2066 |
3.5156 |
|
14 |
CL16 |
OB137 |
50 |
0.0023 |
.913 |
.955 |
-11 |
110 |
5.6 |
0.3637 |
0.1837 |
100.0 |
|
13 |
CL15 |
OB74 |
45 |
0.0029 |
.910 |
.953 |
-10 |
116 |
3.7 |
0.3553 |
0.2130 |
3.5156 |
|
12 |
CL28 |
OB49 |
46 |
0.0036 |
.907 |
.950 |
-8.0 |
122 |
5.2 |
0.3223 |
0.1736 |
8.3678 |
T |
11 |
CL12 |
OB85 |
47 |
0.0036 |
.903 |
.946 |
-7.6 |
130 |
4.8 |
0.3223 |
0.1736 |
8.3678 |
|
10 |
CL11 |
OB98 |
48 |
0.0033 |
.900 |
.942 |
-7.1 |
140 |
4.1 |
0.2879 |
0.1479 |
8.3678 |
|
9 |
CL13 |
OB24 |
46 |
0.0037 |
.896 |
.936 |
-6.5 |
152 |
4.4 |
0.2802 |
0.2005 |
3.5156 |
|
8 |
CL10 |
OB25 |
49 |
0.0019 |
.894 |
.930 |
-5.5 |
171 |
2.2 |
0.2699 |
0.1372 |
8.3678 |
|
7 |
CL8 |
OB121 |
50 |
0.0035 |
.891 |
.921 |
-4.5 |
194 |
4.0 |
0.2586 |
0.1372 |
8.3678 |
|
6 |
CL9 |
OB45 |
47 |
0.0042 |
.886 |
.911 |
-3.3 |
225 |
4.6 |
0.1412 |
0.0832 |
3.5156 |
|
5 |
CL6 |
OB39 |
48 |
0.0049 |
.882 |
.895 |
-1.7 |
270 |
5.0 |
0.107 |
0.0605 |
3.5156 |
|
4 |
CL5 |
OB21 |
49 |
0.0049 |
.877 |
.872 |
0.35 |
346 |
4.7 |
0.0969 |
0.0541 |
3.5156 |
|
3 |
CL4 |
OB90 |
50 |
0.0047 |
.872 |
.827 |
3.28 |
500 |
4.1 |
0.0715 |
0.0370 |
3.5156 |
|
2 |
CL3 |
CL7 |
100 |
0.0993 |
.773 |
.697 |
3.83 |
503 |
91.9 |
2.6277 |
3.5156 |
8.3678 |
|
3 modal clusters have been formed. |
|
Cluster Analysis of Fisher (1936) Iris Data |
Frequency Percent Row Pct Col Pct |
|
Table of CLUSTER by Species |
CLUSTER |
Species |
Total |
Setosa |
Versicolor |
Virginica |
1 |
50 33.33 100.00 100.00 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
50 33.33 |
2 |
0 0.00 0.00 0.00 |
47 31.33 94.00 94.00 |
3 2.00 6.00 6.00 |
50 33.33 |
3 |
0 0.00 0.00 0.00 |
3 2.00 6.00 6.00 |
47 31.33 94.00 94.00 |
50 33.33 |
Total |
50 33.33 |
50 33.33 |
50 33.33 |
150 100.00 |
|
|
The CLUSTER procedure is not practical for very large
data sets because, with most methods, the CPU time varies
as the square or cube of the number of observations.
The FASTCLUS procedure requires time proportional
to the number of observations and can, therefore,
be used with much larger data sets than PROC CLUSTER.
If you want to hierarchically cluster a very large data set,
you can use PROC FASTCLUS for a preliminary cluster analysis
producing a large number of clusters and then use PROC CLUSTER
to hierarchically cluster the preliminary clusters.
FASTCLUS automatically creates variables _FREQ_
and _RMSSTD_ in the MEAN= output data set.
These variables are then automatically used by
PROC CLUSTER in the computation of various statistics.
The iris data are used to illustrate
the process of clustering clusters.
In the preliminary analysis, PROC FASTCLUS produces ten
clusters, which are then crosstabulated with species.
The data set containing the preliminary clusters
is sorted in preparation for later merges.
The results are shown in Output 23.3.3.
title2 'Preliminary Analysis by FASTCLUS';
proc fastclus data=iris summary maxc=10 maxiter=99 converge=0
mean=mean out=prelim cluster=preclus;
var petal: sepal:;
run;
proc freq;
tables preclus*species;
run;
proc sort data=prelim;
by preclus;
run;
Output 23.3.3: Preliminary Analysis of Fisher Iris Data
Cluster Analysis of Fisher (1936) Iris Data |
Preliminary Analysis by FASTCLUS |
The FASTCLUS Procedure |
Replace=FULL Radius=0 Maxclusters=10 Maxiter=99 Converge=0 |
Cluster Summary |
Cluster |
Frequency |
RMS Std Deviation |
Maximum Distance from Seed to Observation |
Radius Exceeded |
Nearest Cluster |
Distance Between Cluster Centroids |
1 |
9 |
2.7067 |
8.2027 |
|
5 |
8.7362 |
2 |
19 |
2.2001 |
7.7340 |
|
4 |
6.2243 |
3 |
18 |
2.1496 |
6.2173 |
|
8 |
7.5049 |
4 |
4 |
2.5249 |
5.3268 |
|
2 |
6.2243 |
5 |
3 |
2.7234 |
5.8214 |
|
1 |
8.7362 |
6 |
7 |
2.2939 |
5.1508 |
|
2 |
9.3318 |
7 |
17 |
2.0274 |
6.9576 |
|
10 |
7.9503 |
8 |
18 |
2.2628 |
7.1135 |
|
3 |
7.5049 |
9 |
22 |
2.2666 |
7.5029 |
|
8 |
9.0090 |
10 |
33 |
2.0594 |
10.0033 |
|
7 |
7.9503 |
Pseudo F Statistic = |
370.58 |
Observed Over-All R-Squared = |
0.95971 |
Approximate Expected Over-All R-Squared = |
0.82928 |
Cubic Clustering Criterion = |
27.077 |
WARNING: The two values above are invalid for correlated variables. |
|
Cluster Analysis of Fisher (1936) Iris Data |
Preliminary Analysis by FASTCLUS |
Frequency Percent Row Pct Col Pct |
|
Table of PRECLUS by Species |
PRECLUS(Cluster |
Species |
Total |
Setosa |
Versicolor |
Virginica |
1 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
9 6.00 100.00 18.00 |
9 6.00 |
2 |
0 0.00 0.00 0.00 |
19 12.67 100.00 38.00 |
0 0.00 0.00 0.00 |
19 12.67 |
3 |
0 0.00 0.00 0.00 |
18 12.00 100.00 36.00 |
0 0.00 0.00 0.00 |
18 12.00 |
4 |
0 0.00 0.00 0.00 |
3 2.00 75.00 6.00 |
1 0.67 25.00 2.00 |
4 2.67 |
5 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
3 2.00 100.00 6.00 |
3 2.00 |
6 |
0 0.00 0.00 0.00 |
7 4.67 100.00 14.00 |
0 0.00 0.00 0.00 |
7 4.67 |
7 |
17 11.33 100.00 34.00 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
17 11.33 |
8 |
0 0.00 0.00 0.00 |
3 2.00 16.67 6.00 |
15 10.00 83.33 30.00 |
18 12.00 |
9 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
22 14.67 100.00 44.00 |
22 14.67 |
10 |
33 22.00 100.00 66.00 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
33 22.00 |
Total |
50 33.33 |
50 33.33 |
50 33.33 |
150 100.00 |
|
|
The following macro, CLUS, clusters the preliminary clusters.
There is one argument to choose the METHOD=
specification to be used by PROC CLUSTER.
The TREE procedure creates an output data set containing
the 3-cluster partition, which is sorted and merged with
the OUT= data set from PROC FASTCLUS to determine to which
cluster each of the original 150 observations belongs.
The SHOW macro is then used to display the results.
In this example, the CLUS macro is invoked using Ward's method,
which produces 16 misclassifications, and Wong's hybrid
method, which produces 22 misclassifications.
The results are shown in Output 23.3.4
and Output 23.3.5.
%macro clus(method);
proc cluster data=mean method=&method ccc pseudo;
var petal: sepal:;
copy preclus;
run;
proc tree noprint ncl=3 out=out;
copy petal: sepal: preclus;
run;
proc sort data=out;
by preclus;
run;
data clus;
merge prelim out;
by preclus;
run;
%show;
%mend;
title2 'Clustering Clusters by Ward''s Method';
%clus(ward);
title2 'Clustering Clusters by Wong''s Hybrid Method';
%clus(twostage hybrid);
Output 23.3.4: Clustering Clusters: with Ward's Method
Cluster Analysis of Fisher (1936) Iris Data |
Clustering Clusters by Ward's Method |
The CLUSTER Procedure |
Ward's Minimum Variance Cluster Analysis |
Eigenvalues of the Covariance Matrix |
|
Eigenvalue |
Difference |
Proportion |
Cumulative |
1 |
416.976349 |
398.666421 |
0.9501 |
0.9501 |
2 |
18.309928 |
14.952922 |
0.0417 |
0.9918 |
3 |
3.357006 |
3.126943 |
0.0076 |
0.9995 |
4 |
0.230063 |
|
0.0005 |
1.0000 |
Root-Mean-Square Total-Sample Standard Deviation = 10.69224 |
Root-Mean-Square Distance Between Observations = 30.24221 |
Cluster History |
NCL |
Clusters Joined |
FREQ |
SPRSQ |
RSQ |
ERSQ |
CCC |
PSF |
PST2 |
T i e |
9 |
OB2 |
OB4 |
23 |
0.0019 |
.958 |
.932 |
6.26 |
400 |
6.3 |
|
8 |
OB1 |
OB5 |
12 |
0.0025 |
.955 |
.926 |
6.75 |
434 |
5.8 |
|
7 |
CL9 |
OB6 |
30 |
0.0069 |
.948 |
.918 |
6.28 |
438 |
19.5 |
|
6 |
OB3 |
OB8 |
36 |
0.0074 |
.941 |
.907 |
6.21 |
459 |
26.0 |
|
5 |
OB7 |
OB10 |
50 |
0.0104 |
.931 |
.892 |
6.15 |
485 |
42.2 |
|
4 |
CL8 |
OB9 |
34 |
0.0162 |
.914 |
.870 |
4.28 |
519 |
39.3 |
|
3 |
CL7 |
CL6 |
66 |
0.0318 |
.883 |
.824 |
4.39 |
552 |
59.7 |
|
2 |
CL4 |
CL3 |
100 |
0.1099 |
.773 |
.695 |
3.94 |
503 |
113 |
|
1 |
CL2 |
CL5 |
150 |
0.7726 |
.000 |
.000 |
0.00 |
. |
503 |
|
|
Cluster Analysis of Fisher (1936) Iris Data |
Frequency Percent Row Pct Col Pct |
|
Table of CLUSTER by Species |
CLUSTER |
Species |
Total |
Setosa |
Versicolor |
Virginica |
1 |
0 0.00 0.00 0.00 |
50 33.33 75.76 100.00 |
16 10.67 24.24 32.00 |
66 44.00 |
2 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
34 22.67 100.00 68.00 |
34 22.67 |
3 |
50 33.33 100.00 100.00 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
50 33.33 |
Total |
50 33.33 |
50 33.33 |
50 33.33 |
150 100.00 |
|
|
Output 23.3.5: Clustering Clusters: PROC CLUSTER with Wong's Hybrid Method
Cluster Analysis of Fisher (1936) Iris Data |
Clustering Clusters by Wong's Hybrid Method |
The CLUSTER Procedure |
Two-Stage Density Linkage Clustering |
Eigenvalues of the Covariance Matrix |
|
Eigenvalue |
Difference |
Proportion |
Cumulative |
1 |
416.976349 |
398.666421 |
0.9501 |
0.9501 |
2 |
18.309928 |
14.952922 |
0.0417 |
0.9918 |
3 |
3.357006 |
3.126943 |
0.0076 |
0.9995 |
4 |
0.230063 |
|
0.0005 |
1.0000 |
Root-Mean-Square Total-Sample Standard Deviation = 10.69224 |
Cluster History |
NCL |
|
FREQ |
SPRSQ |
RSQ |
ERSQ |
CCC |
PSF |
PST2 |
Normalized Fusion Density |
Maximum Density in Each Cluster |
T i e |
Clusters Joined |
Lesser |
Greater |
9 |
OB10 |
OB7 |
50 |
0.0104 |
.949 |
.932 |
3.81 |
330 |
42.2 |
40.24 |
58.2179 |
100.0 |
|
8 |
OB3 |
OB8 |
36 |
0.0074 |
.942 |
.926 |
3.22 |
329 |
26.0 |
27.981 |
39.4511 |
48.4350 |
|
7 |
OB2 |
OB4 |
23 |
0.0019 |
.940 |
.918 |
4.24 |
373 |
6.3 |
23.775 |
8.9675 |
46.3026 |
|
6 |
CL8 |
OB9 |
58 |
0.0194 |
.921 |
.907 |
2.13 |
334 |
46.3 |
20.724 |
46.8846 |
48.4350 |
|
5 |
CL7 |
OB6 |
30 |
0.0069 |
.914 |
.892 |
3.09 |
383 |
19.5 |
13.303 |
17.6360 |
46.3026 |
|
4 |
CL6 |
OB1 |
67 |
0.0292 |
.884 |
.870 |
1.21 |
372 |
41.0 |
8.4137 |
10.8758 |
48.4350 |
|
3 |
CL4 |
OB5 |
70 |
0.0138 |
.871 |
.824 |
3.33 |
494 |
12.3 |
5.1855 |
6.2890 |
48.4350 |
|
2 |
CL3 |
CL5 |
100 |
0.0979 |
.773 |
.695 |
3.94 |
503 |
89.5 |
19.513 |
46.3026 |
48.4350 |
|
1 |
CL2 |
CL9 |
150 |
0.7726 |
.000 |
.000 |
0.00 |
. |
503 |
1.3337 |
48.4350 |
100.0 |
|
3 modal clusters have been formed. |
|
Cluster Analysis of Fisher (1936) Iris Data |
Frequency Percent Row Pct Col Pct |
|
Table of CLUSTER by Species |
CLUSTER |
Species |
Total |
Setosa |
Versicolor |
Virginica |
1 |
50 33.33 100.00 100.00 |
0 0.00 0.00 0.00 |
0 0.00 0.00 0.00 |
50 33.33 |
2 |
0 0.00 0.00 0.00 |
21 14.00 30.00 42.00 |
49 32.67 70.00 98.00 |
70 46.67 |
3 |
0 0.00 0.00 0.00 |
29 19.33 96.67 58.00 |
1 0.67 3.33 2.00 |
30 20.00 |
Total |
50 33.33 |
50 33.33 |
50 33.33 |
150 100.00 |
|
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.