Example 42.2: Cluster Analysis of Flying Mileages between Ten American Cities
This example uses distance data and illustrates the use of the
TRANSPOSE procedure
and the DATA step to fill in the upper
triangle of the distance matrix.
The results are displayed in Output 42.2.1 through Output 42.2.2.
The following statements produce Output 42.2.1:
title 'Modeclus Analysis of 10 American Cities';
title2 'Based on Flying Mileages';
options ls=90;
data mileages(type=distance);
input (ATLANTA CHICAGO DENVER HOUSTON LOSANGELES
MIAMI NEWYORK SANFRAN SEATTLE WASHDC) (5.)
@53 CITY $15.;
datalines;
0 ATLANTA
587 0 CHICAGO
1212 920 0 DENVER
701 940 879 0 HOUSTON
1936 1745 831 1374 0 LOS ANGELES
604 1188 1726 968 2339 0 MIAMI
748 713 1631 1420 2451 1092 0 NEW YORK
2139 1858 949 1645 347 2594 2571 0 SAN FRANCISCO
2182 1737 1021 1891 959 2734 2408 678 0 SEATTLE
543 597 1494 1220 2300 923 205 2442 2329 0 WASHINGTON D.C.
;
*-----Fill in Upper Triangle of Distance Matrix---------------;
proc transpose out=tran;
copy CITY;
data mileages(type=distance);
merge mileages tran;
array var ATLANTA--WASHDC;
array col col1-col10;
drop col1-col10 _name_;
do over var;
var=sum(var,col);
end;
*-----Clustering with K-Nearest-Neighbor Density Estimates-----;
proc modeclus data=mileages all m=1 k=3;
id CITY;
run;
Output 42.2.1: Clustering with K-Nearest-Neighbor Density Estimates
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Nearest Neighbor List |
CITY |
Neighbor |
Distance |
ATLANTA |
WASHINGTON D.C. |
543.0000000 |
|
CHICAGO |
587.0000000 |
CHICAGO |
ATLANTA |
587.0000000 |
|
WASHINGTON D.C. |
597.0000000 |
DENVER |
LOS ANGELES |
831.0000000 |
|
HOUSTON |
879.0000000 |
HOUSTON |
ATLANTA |
701.0000000 |
|
DENVER |
879.0000000 |
LOS ANGELES |
SAN FRANCISCO |
347.0000000 |
|
DENVER |
831.0000000 |
MIAMI |
ATLANTA |
604.0000000 |
|
WASHINGTON D.C. |
923.0000000 |
NEW YORK |
WASHINGTON D.C. |
205.0000000 |
|
CHICAGO |
713.0000000 |
SAN FRANCISCO |
LOS ANGELES |
347.0000000 |
|
SEATTLE |
678.0000000 |
SEATTLE |
SAN FRANCISCO |
678.0000000 |
|
LOS ANGELES |
959.0000000 |
WASHINGTON D.C. |
NEW YORK |
205.0000000 |
|
ATLANTA |
543.0000000 |
|
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
The MODECLUS Procedure |
K=3 METHOD=1 |
Sums of Density Estimates Within Neighborhood |
Cluster |
CITY |
Estimated Density |
Same Cluster |
Other Clusters |
Total |
Cluster Proportion Same/Total |
1 |
ATLANTA |
0.00025554 |
0.0005275 |
0 |
0.0005275 |
1.000 |
|
CHICAGO |
0.00025126 |
0.00053178 |
0 |
0.00053178 |
1.000 |
|
HOUSTON |
0.00017065 |
0.00025554 |
0.00017065 |
0.00042619 |
0.600 |
|
MIAMI |
0.00016251 |
0.00053178 |
0 |
0.00053178 |
1.000 |
|
NEW YORK |
0.00021038 |
0.0005275 |
0 |
0.0005275 |
1.000 |
|
WASHINGTON D.C. |
0.00027624 |
0.00046592 |
0 |
0.00046592 |
1.000 |
2 |
DENVER |
0.00017065 |
0.00018051 |
0.00017065 |
0.00035115 |
0.514 |
|
LOS ANGELES |
0.00018051 |
0.00039189 |
0 |
0.00039189 |
1.000 |
|
SAN FRANCISCO |
0.00022124 |
0.00033692 |
0 |
0.00033692 |
1.000 |
|
SEATTLE |
0.00015641 |
0.00040174 |
0 |
0.00040174 |
1.000 |
Boundary Objects -Cluster Proportions- |
CITY |
Density |
Cluster |
1 |
2 |
DENVER |
0.0001706485 |
2 |
0.486 |
0.514 |
HOUSTON |
0.0001706485 |
1 |
0.600 |
0.400 |
Cluster Statistics |
Cluster |
Frequency |
Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 |
6 |
0.00027624 |
1 |
0.00017065 |
2 |
4 |
0.00022124 |
1 |
0.00017065 |
|
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary |
K |
Number of Clusters |
Frequency of Unclassified Objects |
3 |
2 |
0 |
|
The following statements produce Output 42.2.2:
*------Clustering with Uniform Kernel Density Estimates--------;
proc modeclus data=mileages all m=1 r=600 800;
id CITY;
run;
Output 42.2.2: Clustering with Uniform Kernel Density Estimates
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Nearest Neighbor List |
CITY |
Neighbor |
Distance |
ATLANTA |
WASHINGTON D.C. |
543.0000000 |
|
CHICAGO |
587.0000000 |
|
MIAMI |
604.0000000 |
|
HOUSTON |
701.0000000 |
|
NEW YORK |
748.0000000 |
CHICAGO |
ATLANTA |
587.0000000 |
|
WASHINGTON D.C. |
597.0000000 |
|
NEW YORK |
713.0000000 |
HOUSTON |
ATLANTA |
701.0000000 |
LOS ANGELES |
SAN FRANCISCO |
347.0000000 |
MIAMI |
ATLANTA |
604.0000000 |
NEW YORK |
WASHINGTON D.C. |
205.0000000 |
|
CHICAGO |
713.0000000 |
|
ATLANTA |
748.0000000 |
SAN FRANCISCO |
LOS ANGELES |
347.0000000 |
|
SEATTLE |
678.0000000 |
SEATTLE |
SAN FRANCISCO |
678.0000000 |
WASHINGTON D.C. |
NEW YORK |
205.0000000 |
|
ATLANTA |
543.0000000 |
|
CHICAGO |
597.0000000 |
|
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
The MODECLUS Procedure |
R=600 METHOD=1 |
Sums of Density Estimates Within Neighborhood |
Cluster |
CITY |
Estimated Density |
Same Cluster |
Other Clusters |
Total |
Cluster Proportion Same/Total |
1 |
ATLANTA |
0.00025 |
0.00058333 |
0 |
0.00058333 |
1.000 |
|
CHICAGO |
0.00025 |
0.00058333 |
0 |
0.00058333 |
1.000 |
|
NEW YORK |
0.00016667 |
0.00033333 |
0 |
0.00033333 |
1.000 |
|
WASHINGTON D.C. |
0.00033333 |
0.00066667 |
0 |
0.00066667 |
1.000 |
2 |
LOS ANGELES |
0.00016667 |
0.00016667 |
0 |
0.00016667 |
1.000 |
|
SAN FRANCISCO |
0.00016667 |
0.00016667 |
0 |
0.00016667 |
1.000 |
3 |
DENVER |
0.00008333 |
0 |
0 |
0 |
. |
4 |
HOUSTON |
0.00008333 |
0 |
0 |
0 |
. |
5 |
MIAMI |
0.00008333 |
0 |
0 |
0 |
. |
6 |
SEATTLE |
0.00008333 |
0 |
0 |
0 |
. |
Cluster Statistics |
Cluster |
Frequency |
Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 |
4 |
0.00033333 |
0 |
. |
2 |
2 |
0.00016667 |
0 |
. |
3 |
1 |
0.00008333 |
0 |
. |
4 |
1 |
0.00008333 |
0 |
. |
5 |
1 |
0.00008333 |
0 |
. |
6 |
1 |
0.00008333 |
0 |
. |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
The MODECLUS Procedure |
R=800 METHOD=1 |
Sums of Density Estimates Within Neighborhood |
Cluster |
CITY |
Estimated Density |
Same Cluster |
Other Clusters |
Total |
Cluster Proportion Same/Total |
1 |
ATLANTA |
0.000375 |
0.001 |
0 |
0.001 |
1.000 |
|
CHICAGO |
0.00025 |
0.000875 |
0 |
0.000875 |
1.000 |
|
HOUSTON |
0.000125 |
0.000375 |
0 |
0.000375 |
1.000 |
|
MIAMI |
0.000125 |
0.000375 |
0 |
0.000375 |
1.000 |
|
NEW YORK |
0.00025 |
0.000875 |
0 |
0.000875 |
1.000 |
|
WASHINGTON D.C. |
0.00025 |
0.000875 |
0 |
0.000875 |
1.000 |
2 |
LOS ANGELES |
0.000125 |
0.0001875 |
0 |
0.0001875 |
1.000 |
|
SAN FRANCISCO |
0.0001875 |
0.00025 |
0 |
0.00025 |
1.000 |
|
SEATTLE |
0.000125 |
0.0001875 |
0 |
0.0001875 |
1.000 |
3 |
DENVER |
0.0000625 |
0 |
0 |
0 |
. |
Cluster Statistics |
Cluster |
Frequency |
Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 |
6 |
0.000375 |
0 |
. |
2 |
3 |
0.0001875 |
0 |
. |
3 |
1 |
0.0000625 |
0 |
. |
|
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary |
R |
Number of Clusters |
Frequency of Unclassified Objects |
600 |
6 |
0 |
800 |
3 |
0 |
|
The following statements produce Output 42.2.3:
*------Uniform Kernel Density Estimates, Clustering
Neighborhoods extended to nearest neighbor--------------;
proc modeclus data=mileages list m=1 ck=2 r=600 800;
id CITY;
run;
Output 42.2.3: Uniform Kernel Density Estimates, Clustering Neighborhoods
Extended to Nearest Neighbor
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
The MODECLUS Procedure |
CK=2 R=600 METHOD=1 |
Sums of Density Estimates Within Neighborhood |
Cluster |
CITY |
Estimated Density |
Same Cluster |
Other Clusters |
Total |
Cluster Proportion Same/Total |
1 |
ATLANTA |
0.00025 |
0.00058333 |
0 |
0.00058333 |
1.000 |
|
CHICAGO |
0.00025 |
0.00058333 |
0 |
0.00058333 |
1.000 |
|
HOUSTON |
0.00008333 |
0.00025 |
0 |
0.00025 |
1.000 |
|
MIAMI |
0.00008333 |
0.00025 |
0 |
0.00025 |
1.000 |
|
NEW YORK |
0.00016667 |
0.00033333 |
0 |
0.00033333 |
1.000 |
|
WASHINGTON D.C. |
0.00033333 |
0.00066667 |
0 |
0.00066667 |
1.000 |
2 |
DENVER |
0.00008333 |
0.00016667 |
0 |
0.00016667 |
1.000 |
|
LOS ANGELES |
0.00016667 |
0.00016667 |
0 |
0.00016667 |
1.000 |
|
SAN FRANCISCO |
0.00016667 |
0.00016667 |
0 |
0.00016667 |
1.000 |
|
SEATTLE |
0.00008333 |
0.00016667 |
0 |
0.00016667 |
1.000 |
Cluster Statistics |
Cluster |
Frequency |
Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 |
6 |
0.00033333 |
0 |
. |
2 |
4 |
0.00016667 |
0 |
. |
|
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
The MODECLUS Procedure |
CK=2 R=800 METHOD=1 |
Sums of Density Estimates Within Neighborhood |
Cluster |
CITY |
Estimated Density |
Same Cluster |
Other Clusters |
Total |
Cluster Proportion Same/Total |
1 |
ATLANTA |
0.000375 |
0.001 |
0 |
0.001 |
1.000 |
|
CHICAGO |
0.00025 |
0.000875 |
0 |
0.000875 |
1.000 |
|
HOUSTON |
0.000125 |
0.000375 |
0 |
0.000375 |
1.000 |
|
MIAMI |
0.000125 |
0.000375 |
0 |
0.000375 |
1.000 |
|
NEW YORK |
0.00025 |
0.000875 |
0 |
0.000875 |
1.000 |
|
WASHINGTON D.C. |
0.00025 |
0.000875 |
0 |
0.000875 |
1.000 |
2 |
DENVER |
0.0000625 |
0.000125 |
0 |
0.000125 |
1.000 |
|
LOS ANGELES |
0.000125 |
0.0001875 |
0 |
0.0001875 |
1.000 |
|
SAN FRANCISCO |
0.0001875 |
0.00025 |
0 |
0.00025 |
1.000 |
|
SEATTLE |
0.000125 |
0.0001875 |
0 |
0.0001875 |
1.000 |
Cluster Statistics |
Cluster |
Frequency |
Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 |
6 |
0.000375 |
0 |
. |
2 |
4 |
0.0001875 |
0 |
. |
|
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary |
R |
CK |
Number of Clusters |
Frequency of Unclassified Objects |
600 |
2 |
2 |
0 |
800 |
2 |
2 |
0 |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.