Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MODECLUS Procedure

Example 42.5: Using the TRACE Option when METHOD=6

To illustrate how the TRACE option can help you to understand the clustering process when METHOD=6 is specified, the following data set is created with 12 observations.

   data test;
      input x@@;
      datalines;
   1 2 3 4 5 7.5 9 11.5 13 14.5 15 16
   ;

The first five observations seem to be close to each other, and the last five observations seem to be close to each other. Observation 6 is separated from the first five observations with a (Euclidean) distance of 2.5, and the same distance separates observation 7 from the last five observations. Observations 6 and 7 differ by 1.5.

Suppose METHOD=6 with a radius=2.5 is chosen for the cluster analysis. You can specify the TRACE option to understand how each observation is assigned.

The following statements produce Output 42.5.1 and Output 42.5.2:

   /*-- METHOD=6 with TRACE and THRESHOLD=0.5 (default) --*/
   proc modeclus method=6 r=2.5 trace short out=out;
      var x;
   run;

   data markobs;
      drop _r_ _method_ _obs_ density cluster;
      length function style $8 text $ 2;
      retain xsys '2' ysys '2' hsys '1' when 'a';
      set out;
      /* create the text for obs */
      function='label'; size=4;
      style='swiss';
      text=left(put(_obs_,2.));
      position='3';
      x=x; y=density;
      output;
   run;
   legend1 frame cframe=ligr cborder=black
          position=center value=(justify=center);
   axis1 label=(angle=90 rotate=0) minor=none;
   axis2 minor=none;
   title 'Plot of DENSITY*X=CLUSTER';
   proc gplot data=out;
      plot density*x=cluster/ annotate=markobs 
                              frame cframe=ligr
                              legend=legend1
                              vaxis=axis1 haxis=axis2;
   run;

Output 42.5.1: Partial Output of METHOD=6 with TRACE and Default THRESHOLD=

The MODECLUS Procedure
R=2.5 METHOD=6

Trace of Clustering Algorithm
Obs Density Cluster Ratio
Old New
3 0.0833333 -1 1 M
2 0.0666667 0 1 N
4 0.0666667 0 1 N
5 0.0666667 0 1 N
1 0.0500000 0 1 N
6 0.0500000 0 1 0.571
7 0.0500000 -1 1 0.500
9 0.0666667 -1 2 M
8 0.0500000 0 2 N
10 0.0666667 -1 2 S
12 0.0500000 0 2 N
11 0.0666667 -1 2 S

Output 42.5.2: Density Plot
mode5b.gif (4159 bytes)

Notice that in Output 42.5.1, observation 7 is originally a seed (indicated by a value of -1 in the "Old" column) and then assigned to cluster 1. This is because the ratio of observation 7 to cluster 1 is 0.5 and is not less than the default value of THRESHOLD= (0.5).

If the value of the THRESHOLD= option is increased to 0.55, observation 7 should be excluded from cluster 1 and the cluster membership of observation 7 is changed.

The following statements produce Output 42.5.3 and Output 42.5.4:

   /*-- METHOD=6 with TRACE and THRESHOLD=0.55 --*/
   proc modeclus method=6 r=2.5 trace threshold=0.55 short 
                 out=out;
      var x;
   run;

        . . .   (the Data Step and the PROC GPLOT statement
                 are omitted because they are the same as the
                 previous job)

Output 42.5.3: Partial Output of METHOD=6 with TRACE and THRESHOLD=.55

The MODECLUS Procedure
R=2.5 METHOD=6

Trace of Clustering Algorithm
Obs Density Cluster Ratio
Old New
3 0.0833333 -1 1 M
2 0.0666667 0 1 N
4 0.0666667 0 1 N
5 0.0666667 0 1 N
1 0.0500000 0 1 N
6 0.0500000 0 1 0.571
9 0.0666667 -1 2 M
8 0.0500000 0 2 N
10 0.0666667 -1 2 S
12 0.0500000 0 2 N
11 0.0666667 -1 2 S
7 0.0500000 -1 2 S

Output 42.5.4: Density Plot
mode5d.gif (4167 bytes)

In Output 42.5.3, observation 7 is a seed that is excluded by cluster 1 because its ratio to cluster 1 is less than 0.55. Being a neighbor of a member (observation 8) of cluster 2, observation 7 eventually joins cluster 2 even though it remains a "SEED." (See Step 2.2 in the section "METHOD=6".)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.