Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MODECLUS Procedure

Example 42.3: Cluster Analysis with Significance Tests

This example uses artificial data containing two clusters. One cluster is from a circular bivariate normal distribution. The other is a ring-shaped cluster that completely surrounds the first cluster. Without significance tests, the ring is divided into several sample clusters for any degree of smoothing that yields reasonable density estimates. The JOIN= option puts the ring back together. Output 42.3.1 displays a short summary generated from the first PROC MODECLUS statement. Output 42.3.2 contains a series of tables produced from the second PROC MODECLUS statement. The lack of p-value in the JOIN= option makes joining continue until only one cluster remains (see the description of the JOIN= option). The cluster memberships are then plotted as displayed in Output 42.3.3.

   title  'Modeclus Analysis with the JOIN= option';
   title2 'A Normal Cluster Surrounded by a Ring Cluster';
   options ls=120 ps=38;

   data circle; keep x y;
      c=1;
      do n=1 to 30;
         x=rannor(5);
         y=rannor(5);
         output;
      end;

      c=2;
      do n=1 to 300;
         x=rannor(5);
         y=rannor(5);
         z=rannor(5)+8;
         l=z/sqrt(x**2+y**2);
         x=x*l;
         y=y*l;
         output;
      end;

   axis1 label=(angle=90 rotate=0) minor=none 
         order=(-10 to 10 by 5);
   axis2 minor=none order=(-15 to 15 by 5);

   proc modeclus data=circle m=1 r=1 to 3.5 by .25 join=20 short;
   proc modeclus data=circle m=1 r=2.5 join out=out;
   
   proc gplot data=out;
      plot y*x=cluster/frame cframe=ligr
                       vzero nolegend 
                       vaxis=axis1 haxis=axis2 ;
      by _NJOIN_;
   run;

Output 42.3.1: Significance Tests with the JOIN=20 and SHORT Options

Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Joined
Maximum
P-value
Number of
Clusters
Frequency of
Unclassified
Objects
1 36 0.9339 1 301
1.25 20 0.7131 1 301
1.5 10 0.3296 1 300
1.75 5 0.1990 2 0
2 5 0.0683 2 0
2.25 3 0.0504 2 0
2.5 4 0.0301 2 0
2.75 3 0.0585 2 0
3 5 0.0003 1 0
3.25 4 0.1923 2 0
3.5 4 0.0000 1 0

Output 42.3.2: Significance Tests with the JOIN Option

Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure
R=2.5 METHOD=1

Cluster Statistics -Saddle Test: Version 92.7-
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
Mode
Count
Saddle
Count
Overlap
Count
Z Approx
P-value
1 103 0.00617328 22 0.00308664 39 19 0 2.495 0.5055
2 71 0.00571029 20 0.0043213 36 27 9 1.193 0.999
3 53 0.00509296 18 0.00401263 32 25 10 0.986 0.9999
4 45 0.00478429 19 0.00354964 30 22 14 1.429 0.9924
5 30 0.00462996 0 . 29 0 . 3.611 0.0301
6 28 0.00370397 17 0.00354964 23 22 9 0.000 1


Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure
R=2.5 METHOD=1

Cluster Statistics -Saddle Test: Version 92.7-
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
Mode
Count
Saddle
Count
Overlap
Count
Z Approx
P-value
1 103 0.00617328 22 0.00308664 39 19 0 2.495 0.5055
2 71 0.00571029 20 0.0043213 36 27 9 1.193 0.999
3 53 0.00509296 18 0.00401263 32 25 10 0.986 0.9999
4 73 0.00478429 13 0.00293231 30 18 0 1.588 0.9778
5 30 0.00462996 0 . 29 0 . 3.611 0.0301


Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure
R=2.5 METHOD=1

Cluster Statistics -Saddle Test: Version 92.7-
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
Mode
Count
Saddle
Count
Overlap
Count
Z Approx
P-value
1 156 0.00617328 17 0.00246931 39 15 0 3.130 0.1318
2 71 0.00571029 20 0.0043213 36 27 9 1.193 0.999
3 73 0.00478429 13 0.00293231 30 18 0 1.588 0.9778
4 30 0.00462996 0 . 29 0 . 3.611 0.0301


Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure
R=2.5 METHOD=1

Cluster Statistics -Saddle Test: Version 92.7-
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
Mode
Count
Saddle
Count
Overlap
Count
Z Approx
P-value
1 156 0.00617328 17 0.00246931 39 15 0 3.130 0.1318
2 144 0.00571029 14 0.00293231 36 18 0 2.313 0.6447
3 30 0.00462996 0 . 29 0 . 3.611 0.0301


Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure
R=2.5 METHOD=1

Cluster Statistics -Saddle Test: Version 92.7-
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
Mode
Count
Saddle
Count
Overlap
Count
Z Approx
P-value
1 300 0.00617328 0 . 39 0 . 4.246 0.0026
2 30 0.00462996 0 . 29 0 . 3.611 0.0301


Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure
R=2.5 METHOD=1

Cluster Statistics -Saddle Test: Version 92.7-
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
Mode
Count
Saddle
Count
Overlap
Count
Z Approx
P-value
1 300 0.00617328 0 . 39 0 . 4.246 0.0026


Modeclus Analysis with the JOIN= option
A Normal Cluster Surrounded by a Ring Cluster

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Joined
Maximum
P-value
Number of
Clusters
Frequency of
Unclassified
Objects
2.5 0 1.0000 6 0
2.5 1 0.9999 5 0
2.5 2 0.9990 4 0
2.5 3 0.6447 3 0
2.5 4 0.0301 2 0
2.5 5 0.0026 1 30

Output 42.3.3: Scatter Plots of Cluster Memberships by _NJOIN_
mode3b.gif (9837 bytes)

mode3c.gif (9595 bytes)

mode3d.gif (9243 bytes)

mode3e.gif (9052 bytes)

mode3f.gif (8324 bytes)

mode3g.gif (9007 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.