Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MODECLUS Procedure

Example 42.4: Cluster Analysis: Hertzsprung-Russell Plot

This example uses computer-generated data to mimic a Hertzsprung-Russell plot (Struve and Zebergs 1962, p. 259) of the temperature and luminosity of stars. The data are plotted and displayed in Output 42.4.1; see "Example 4 from Proc Modeclus" in the SAS/STAT Sample Program Library for the complete data set. It appears that there are two main groups of stars and a collection of isolated stars. The long straggling group of points appearing diagonally across the figure represents the main group of stars; the more compact group in the top right-hand corner contains giant stars. The JOIN= option is specified at a 0.05 significance level with various smoothing parameters. The CK=5 option is specified in order to prevent the numerous outliers from forming separate clusters. The results from PROC MODECLUS is displayed in Output 42.4.2. The cluster memberships are then plotted by PROC GPLOT, as displayed in Output 42.4.3.

Notice in Output 42.4.3 that the graphic output from PROC GPLOT when _R_ = 2.5 is not available because only one cluster remains after joining at a 5% significance level, and the results are not written to the OUT= data set. See the description of the JOIN= option for more information.

   title 'Hertzsprung-Russell Plot of Visible Stars';
   title2 'Computer-Generated Fake Data';
   data hr;
      input x y @@;
      label x='-Temperature'
            y='-Luminosity';
      datalines;
    1.0  12.8   0.9  13.7   0.9  12.9   1.0  12.3   1.0  12.2
    2.6  10.9   2.4  10.9   2.5  11.2   2.3  11.5   2.6  12.0
    2.4  12.1   2.3  10.9   2.6  11.5   2.5  11.9   2.4  11.0
    3.4  11.1   3.3  11.2   3.4  11.1   3.4   9.9   3.2  10.4

                      ... 150 lines omitted ...

   18.5  12.6  14.2  16.1  23.2   6.6  11.4  12.4  20.4  11.7
   20.9   8.1  18.9  13.7  16.9   9.7  15.5   9.9  18.3  14.2
   19.3  13.7  17.0  12.9  10.1  11.6  17.9  13.5  14.3   1.4
   13.1  -0.8   8.1  -0.9  20.0   7.0  21.0   8.5  15.6  13.2
   ;

   symbol1 value=circle c=white; 
   symbol2 value=plus c=yellow;
   symbol3 value=triangle c=cyan;
   legend1 frame cframe=ligr cborder=black
           position=center value=(justify=center);
   axis1 label=(angle=90 rotate=0) minor=none;
   axis2 minor=none;

   proc gplot; 
     plot y*x/legend=legend1 frame cframe=ligr vzero 
              vaxis=axis1 haxis=axis2 ;

   proc modeclus data=hr m=1 r=1 1.5 2 2.5 ck=5
                 join=.05 short out=out;
   run;

   title2 'MODECLUS Analysis';
   proc gplot;
      plot y*x=cluster/frame cframe=ligr 
                       vzero legend=legend1
                       vaxis=axis1 haxis=axis2;
      by _R_;
   run;

Output 42.4.1: Scatter Plot of Data
mode4a.gif (5864 bytes)

Output 42.4.2: Results from PROC MODECLUS

Hertzsprung-Russell Plot of Visible Stars
Computer-Generated Fake Data

The MODECLUS Procedure

Cluster Summary
R CK Number of
Clusters
Joined
Maximum
P-value
Number of
Clusters
Frequency of
Unclassified
Objects
1 5 14 0.0001 2 0
1.5 5 6 0.0000 3 0
2 5 4 0.0000 2 0
2.5 5 2 0.0000 1 0

Output 42.4.3: Scatter Plots of Cluster Memberships by _R_
mode4c.gif (5420 bytes)

mode4d.gif (5649 bytes)

mode4e.gif (5445 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.