Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The MODECLUS Procedure

Example 42.2: Cluster Analysis of Flying Mileages between Ten American Cities

This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. The results are displayed in Output 42.2.1 through Output 42.2.2.

The following statements produce Output 42.2.1:

   title 'Modeclus Analysis of 10 American Cities';
   title2 'Based on Flying Mileages';
   options ls=90;

   data mileages(type=distance);
      input (ATLANTA CHICAGO DENVER HOUSTON LOSANGELES
      MIAMI NEWYORK SANFRAN SEATTLE WASHDC) (5.)
      @53 CITY $15.;
      datalines;
      0                                                ATLANTA
    587    0                                           CHICAGO
   1212  920    0                                      DENVER
    701  940  879    0                                 HOUSTON
   1936 1745  831 1374    0                            LOS ANGELES
    604 1188 1726  968 2339    0                       MIAMI
    748  713 1631 1420 2451 1092    0                  NEW YORK
   2139 1858  949 1645  347 2594 2571    0             SAN FRANCISCO
   2182 1737 1021 1891  959 2734 2408  678    0        SEATTLE
    543  597 1494 1220 2300  923  205 2442 2329    0   WASHINGTON D.C.
   ;

   *-----Fill in Upper Triangle of Distance Matrix---------------;
   proc transpose out=tran;
      copy CITY;
   data mileages(type=distance);
      merge mileages tran;
      array var ATLANTA--WASHDC;
      array col col1-col10;
      drop col1-col10 _name_;
      do over var;
         var=sum(var,col);
      end;

   *-----Clustering with K-Nearest-Neighbor Density Estimates-----;
   proc modeclus data=mileages all m=1 k=3;
      id CITY;
   run;

Output 42.2.1: Clustering with K-Nearest-Neighbor Density Estimates

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
CITY Neighbor Distance
ATLANTA WASHINGTON D.C. 543.0000000
  CHICAGO 587.0000000
CHICAGO ATLANTA 587.0000000
  WASHINGTON D.C. 597.0000000
DENVER LOS ANGELES 831.0000000
  HOUSTON 879.0000000
HOUSTON ATLANTA 701.0000000
  DENVER 879.0000000
LOS ANGELES SAN FRANCISCO 347.0000000
  DENVER 831.0000000
MIAMI ATLANTA 604.0000000
  WASHINGTON D.C. 923.0000000
NEW YORK WASHINGTON D.C. 205.0000000
  CHICAGO 713.0000000
SAN FRANCISCO LOS ANGELES 347.0000000
  SEATTLE 678.0000000
SEATTLE SAN FRANCISCO 678.0000000
  LOS ANGELES 959.0000000
WASHINGTON D.C. NEW YORK 205.0000000
  ATLANTA 543.0000000


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
K=3 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster CITY Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 ATLANTA 0.00025554 0.0005275 0 0.0005275 1.000
  CHICAGO 0.00025126 0.00053178 0 0.00053178 1.000
  HOUSTON 0.00017065 0.00025554 0.00017065 0.00042619 0.600
  MIAMI 0.00016251 0.00053178 0 0.00053178 1.000
  NEW YORK 0.00021038 0.0005275 0 0.0005275 1.000
  WASHINGTON D.C. 0.00027624 0.00046592 0 0.00046592 1.000
2 DENVER 0.00017065 0.00018051 0.00017065 0.00035115 0.514
  LOS ANGELES 0.00018051 0.00039189 0 0.00039189 1.000
  SAN FRANCISCO 0.00022124 0.00033692 0 0.00033692 1.000
  SEATTLE 0.00015641 0.00040174 0 0.00040174 1.000

Boundary Objects -Cluster Proportions-
CITY Density Cluster 1 2
DENVER 0.0001706485 2 0.486 0.514
HOUSTON 0.0001706485 1 0.600 0.400

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00027624 1 0.00017065
2 4 0.00022124 1 0.00017065


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
K Number of
Clusters
Frequency of
Unclassified
Objects
3 2 0

The following statements produce Output 42.2.2:

   *------Clustering with Uniform Kernel Density Estimates--------;
   proc modeclus data=mileages all m=1 r=600 800;
      id CITY;
   run;

Output 42.2.2: Clustering with Uniform Kernel Density Estimates

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
CITY Neighbor Distance
ATLANTA WASHINGTON D.C. 543.0000000
  CHICAGO 587.0000000
  MIAMI 604.0000000
  HOUSTON 701.0000000
  NEW YORK 748.0000000
CHICAGO ATLANTA 587.0000000
  WASHINGTON D.C. 597.0000000
  NEW YORK 713.0000000
HOUSTON ATLANTA 701.0000000
LOS ANGELES SAN FRANCISCO 347.0000000
MIAMI ATLANTA 604.0000000
NEW YORK WASHINGTON D.C. 205.0000000
  CHICAGO 713.0000000
  ATLANTA 748.0000000
SAN FRANCISCO LOS ANGELES 347.0000000
  SEATTLE 678.0000000
SEATTLE SAN FRANCISCO 678.0000000
WASHINGTON D.C. NEW YORK 205.0000000
  ATLANTA 543.0000000
  CHICAGO 597.0000000


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster CITY Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 ATLANTA 0.00025 0.00058333 0 0.00058333 1.000
  CHICAGO 0.00025 0.00058333 0 0.00058333 1.000
  NEW YORK 0.00016667 0.00033333 0 0.00033333 1.000
  WASHINGTON D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 LOS ANGELES 0.00016667 0.00016667 0 0.00016667 1.000
  SAN FRANCISCO 0.00016667 0.00016667 0 0.00016667 1.000
3 DENVER 0.00008333 0 0 0 .
4 HOUSTON 0.00008333 0 0 0 .
5 MIAMI 0.00008333 0 0 0 .
6 SEATTLE 0.00008333 0 0 0 .

No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 4 0.00033333 0 .
2 2 0.00016667 0 .
3 1 0.00008333 0 .
4 1 0.00008333 0 .
5 1 0.00008333 0 .
6 1 0.00008333 0 .

 


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster CITY Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 ATLANTA 0.000375 0.001 0 0.001 1.000
  CHICAGO 0.00025 0.000875 0 0.000875 1.000
  HOUSTON 0.000125 0.000375 0 0.000375 1.000
  MIAMI 0.000125 0.000375 0 0.000375 1.000
  NEW YORK 0.00025 0.000875 0 0.000875 1.000
  WASHINGTON D.C. 0.00025 0.000875 0 0.000875 1.000
2 LOS ANGELES 0.000125 0.0001875 0 0.0001875 1.000
  SAN FRANCISCO 0.0001875 0.00025 0 0.00025 1.000
  SEATTLE 0.000125 0.0001875 0 0.0001875 1.000
3 DENVER 0.0000625 0 0 0 .

No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 3 0.0001875 0 .
3 1 0.0000625 0 .


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Frequency of
Unclassified
Objects
600 6 0
800 3 0


The following statements produce Output 42.2.3:

   *------Uniform Kernel Density Estimates, Clustering
          Neighborhoods extended to nearest neighbor--------------;
   proc modeclus data=mileages list m=1 ck=2 r=600 800;
      id CITY;
   run;

Output 42.2.3: Uniform Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster CITY Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 ATLANTA 0.00025 0.00058333 0 0.00058333 1.000
  CHICAGO 0.00025 0.00058333 0 0.00058333 1.000
  HOUSTON 0.00008333 0.00025 0 0.00025 1.000
  MIAMI 0.00008333 0.00025 0 0.00025 1.000
  NEW YORK 0.00016667 0.00033333 0 0.00033333 1.000
  WASHINGTON D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 DENVER 0.00008333 0.00016667 0 0.00016667 1.000
  LOS ANGELES 0.00016667 0.00016667 0 0.00016667 1.000
  SAN FRANCISCO 0.00016667 0.00016667 0 0.00016667 1.000
  SEATTLE 0.00008333 0.00016667 0 0.00016667 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00033333 0 .
2 4 0.00016667 0 .


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster CITY Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 ATLANTA 0.000375 0.001 0 0.001 1.000
  CHICAGO 0.00025 0.000875 0 0.000875 1.000
  HOUSTON 0.000125 0.000375 0 0.000375 1.000
  MIAMI 0.000125 0.000375 0 0.000375 1.000
  NEW YORK 0.00025 0.000875 0 0.000875 1.000
  WASHINGTON D.C. 0.00025 0.000875 0 0.000875 1.000
2 DENVER 0.0000625 0.000125 0 0.000125 1.000
  LOS ANGELES 0.000125 0.0001875 0 0.0001875 1.000
  SAN FRANCISCO 0.0001875 0.00025 0 0.00025 1.000
  SEATTLE 0.000125 0.0001875 0 0.0001875 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 4 0.0001875 0 .


Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R CK Number of
Clusters
Frequency of
Unclassified
Objects
600 2 2 0
800 2 2 0

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.