Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The PRINCOMP Procedure

Example 52.1: Crime Rates

The following data provide crime rates per 100,000 people in seven categories for each of the fifty states in 1977. Since there are seven numeric variables, it is impossible to plot all the variables simultaneously. Principal components can be used to summarize the data in two or three dimensions, and they help to visualize the data. The following statements produce Output 52.1.1:

   data Crime;
      title 'Crime Rates per 100,000 Population by State';
      input State $1-15 Murder Rape Robbery Assault 
            Burglary Larceny Auto_Theft;
      datalines;
   Alabama        14.2 25.2  96.8 278.3 1135.5 1881.9 280.7
   Alaska         10.8 51.6  96.8 284.0 1331.7 3369.8 753.3
   Arizona         9.5 34.2 138.2 312.3 2346.1 4467.4 439.5
   Arkansas        8.8 27.6  83.2 203.4  972.6 1862.1 183.4
   California     11.5 49.4 287.0 358.0 2139.4 3499.8 663.5
   Colorado        6.3 42.0 170.7 292.9 1935.2 3903.2 477.1
   Connecticut     4.2 16.8 129.5 131.8 1346.0 2620.7 593.2
   Delaware        6.0 24.9 157.0 194.2 1682.6 3678.4 467.0
   Florida        10.2 39.6 187.9 449.1 1859.9 3840.5 351.4
   Georgia        11.7 31.1 140.5 256.5 1351.1 2170.2 297.9
   Hawaii          7.2 25.5 128.0  64.1 1911.5 3920.4 489.4
   Idaho           5.5 19.4  39.6 172.5 1050.8 2599.6 237.6
   Illinois        9.9 21.8 211.3 209.0 1085.0 2828.5 528.6
   Indiana         7.4 26.5 123.2 153.5 1086.2 2498.7 377.4
   Iowa            2.3 10.6  41.2  89.8  812.5 2685.1 219.9
   Kansas          6.6 22.0 100.7 180.5 1270.4 2739.3 244.3
   Kentucky       10.1 19.1  81.1 123.3  872.2 1662.1 245.4
   Louisiana      15.5 30.9 142.9 335.5 1165.5 2469.9 337.7
   Maine           2.4 13.5  38.7 170.0 1253.1 2350.7 246.9
   Maryland        8.0 34.8 292.1 358.9 1400.0 3177.7 428.5
   Massachusetts   3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1
   Michigan        9.3 38.9 261.9 274.6 1522.7 3159.0 545.5
   Minnesota       2.7 19.5  85.9  85.8 1134.7 2559.3 343.1
   Mississippi    14.3 19.6  65.7 189.1  915.6 1239.9 144.4
   Missouri        9.6 28.3 189.0 233.5 1318.3 2424.2 378.4
   Montana         5.4 16.7  39.2 156.8  804.9 2773.2 309.2
   Nebraska        3.9 18.1  64.7 112.7  760.0 2316.1 249.1
   Nevada         15.8 49.1 323.1 355.0 2453.1 4212.6 559.2
   New Hampshire   3.2 10.7  23.2  76.0 1041.7 2343.9 293.4
   New Jersey      5.6 21.0 180.4 185.1 1435.8 2774.5 511.5
   New Mexico      8.8 39.1 109.6 343.4 1418.7 3008.6 259.5
   New York       10.7 29.4 472.6 319.1 1728.0 2782.0 745.8
   North Carolina 10.6 17.0  61.3 318.3 1154.1 2037.8 192.1
   North Dakota    0.9  9.0  13.3  43.8  446.1 1843.0 144.7
   Ohio            7.8 27.3 190.5 181.1 1216.0 2696.8 400.4
   Oklahoma        8.6 29.2  73.8 205.0 1288.2 2228.1 326.8
   Oregon          4.9 39.9 124.1 286.9 1636.4 3506.1 388.9
   Pennsylvania    5.6 19.0 130.3 128.0  877.5 1624.1 333.2
   Rhode Island    3.6 10.5  86.5 201.0 1489.5 2844.1 791.4
   South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1
   South Dakota    2.0 13.5  17.9 155.7  570.5 1704.4 147.5
   Tennessee      10.1 29.7 145.8 203.9 1259.7 1776.5 314.0
   Texas          13.3 33.8 152.4 208.2 1603.1 2988.7 397.6
   Utah            3.5 20.3  68.8 147.3 1171.6 3004.6 334.5
   Vermont         1.4 15.9  30.8 101.2 1348.2 2201.0 265.2
   Virginia        9.0 23.3  92.1 165.7  986.2 2521.2 226.7
   Washington      4.3 39.6 106.2 224.8 1605.6 3386.9 360.3
   West Virginia   6.0 13.2  42.2  90.9  597.4 1341.7 163.3
   Wisconsin       2.8 12.9  52.2  63.7  846.9 2614.2 220.7
   Wyoming         5.4 21.9  39.7 173.9  811.6 2772.2 282.0
   ;

   proc princomp out=Crime_Components;
   run;

Output 52.1.1: Results of Principal Component Analysis: PROC PRINCOMP

Crime Rates per 100,000 Population by State

The PRINCOMP Procedure

Observations 50
Variables 7

Simple Statistics
  Murder Rape Robbery Assault Burglary Larceny Auto_Theft
Mean 7.444000000 25.73400000 124.0920000 211.3000000 1291.904000 2671.288000 377.5260000
StD 3.866768941 10.75962995 88.3485672 100.2530492 432.455711 725.908707 193.3944175

Correlation Matrix
  Murder Rape Robbery Assault Burglary Larceny Auto_Theft
Murder 1.0000 0.6012 0.4837 0.6486 0.3858 0.1019 0.0688
Rape 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489
Robbery 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907
Assault 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758
Burglary 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580
Larceny 0.1019 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442
Auto_Theft 0.0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000

Eigenvalues of the Correlation Matrix
  Eigenvalue Difference Proportion Cumulative
1 4.11495951 2.87623768 0.5879 0.5879
2 1.23872183 0.51290521 0.1770 0.7648
3 0.72581663 0.40938458 0.1037 0.8685
4 0.31643205 0.05845759 0.0452 0.9137
5 0.25797446 0.03593499 0.0369 0.9506
6 0.22203947 0.09798342 0.0317 0.9823
7 0.12405606   0.0177 1.0000

Eigenvectors
  Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7
Murder 0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593
Rape 0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485
Robbery 0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903
Assault 0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745
Burglary 0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117
Larceny 0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690
Auto_Theft 0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046


The eigenvalues indicate that two or three components provide a good summary of the data, two components accounting for 76 percent of the total variance and three components explaining 87 percent. Subsequent components contribute less than 5 percent each.

The first component is a measure of overall crime rate since the first eigenvector shows approximately equal loadings on all variables. The second eigenvector has high positive loadings on the variables Auto_Theft and Larceny and high negative loadings on the variables Murder and Assault. There is also a small positive loading on Burglary and a small negative loading on Rape. This component seems to measure the preponderance of property crime over violent crime. The interpretation of the third component is not obvious.

A simple way to examine the principal components in more detail is to display the output data set sorted by each of the large components. The following statements produce Output 52.1.2 through Output 52.1.3:

   proc sort;
      by Prin1;
   run;

   proc print;
      id State;
      var Prin1 Prin2 Murder Rape Robbery 
          Assault Burglary Larceny Auto_Theft;
      title2 'States Listed in Order of Overall Crime Rate';
      title3 'As Determined by the First Principal Component';
   run;

   proc sort;
      by Prin2;
   run;

   proc print;
      id State;
      var Prin1 Prin2 Murder Rape Robbery 
          Assault Burglary Larceny Auto_Theft;
      title2 'States Listed in Order of Property Vs. 
              Violent Crime';
      title3 'As Determined by the Second Principal Component';
   run;

Output 52.1.2: OUT= Data Set Sorted by First Principal Component

Crime Rates per 100,000 Population by State
States Listed in Order of Overall Crime Rate
As Determined by the First Principal Component

State Prin1 Prin2 Murder Rape Robbery Assault Burglary Larceny Auto_Theft
North Dakota -3.96408 0.38767 0.9 9.0 13.3 43.8 446.1 1843.0 144.7
South Dakota -3.17203 -0.25446 2.0 13.5 17.9 155.7 570.5 1704.4 147.5
West Virginia -3.14772 -0.81425 6.0 13.2 42.2 90.9 597.4 1341.7 163.3
Iowa -2.58156 0.82475 2.3 10.6 41.2 89.8 812.5 2685.1 219.9
Wisconsin -2.50296 0.78083 2.8 12.9 52.2 63.7 846.9 2614.2 220.7
New Hampshire -2.46562 0.82503 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4
Nebraska -2.15071 0.22574 3.9 18.1 64.7 112.7 760.0 2316.1 249.1
Vermont -2.06433 0.94497 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2
Maine -1.82631 0.57878 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9
Kentucky -1.72691 -1.14663 10.1 19.1 81.1 123.3 872.2 1662.1 245.4
Pennsylvania -1.72007 -0.19590 5.6 19.0 130.3 128.0 877.5 1624.1 333.2
Montana -1.66801 0.27099 5.4 16.7 39.2 156.8 804.9 2773.2 309.2
Minnesota -1.55434 1.05644 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1
Mississippi -1.50736 -2.54671 14.3 19.6 65.7 189.1 915.6 1239.9 144.4
Idaho -1.43245 -0.00801 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6
Wyoming -1.42463 0.06268 5.4 21.9 39.7 173.9 811.6 2772.2 282.0
Arkansas -1.05441 -1.34544 8.8 27.6 83.2 203.4 972.6 1862.1 183.4
Utah -1.04996 0.93656 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5
Virginia -0.91621 -0.69265 9.0 23.3 92.1 165.7 986.2 2521.2 226.7
North Carolina -0.69925 -1.67027 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1
Kansas -0.63407 -0.02804 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3
Connecticut -0.54133 1.50123 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2
Indiana -0.49990 0.00003 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4
Oklahoma -0.32136 -0.62429 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8
Rhode Island -0.20156 2.14658 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4
Tennessee -0.13660 -1.13498 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0
Alabama -0.04988 -2.09610 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7
New Jersey 0.21787 0.96421 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5
Ohio 0.23953 0.09053 7.8 27.3 190.5 181.1 1216.0 2696.8 400.4
Georgia 0.49041 -1.38079 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9
Illinois 0.51290 0.09423 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6
Missouri 0.55637 -0.55851 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4
Hawaii 0.82313 1.82392 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4
Washington 0.93058 0.73776 4.3 39.6 106.2 224.8 1605.6 3386.9 360.3
Delaware 0.96458 1.29674 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0
Massachusetts 0.97844 2.63105 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1
Louisiana 1.12020 -2.08327 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7
New Mexico 1.21417 -0.95076 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5
Texas 1.39696 -0.68131 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6
Oregon 1.44900 0.58603 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9
South Carolina 1.60336 -2.16211 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1
Maryland 2.18280 -0.19474 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5
Michigan 2.27333 0.15487 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5
Alaska 2.42151 0.16652 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3
Colorado 2.50929 0.91660 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1
Arizona 3.01414 0.84495 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5
Florida 3.11175 -0.60392 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4
New York 3.45248 0.43289 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8
California 4.28380 0.14319 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5
Nevada 5.26699 -0.25262 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2

Output 52.1.3: OUT= Data Set Sorted by Second Principal Component

Crime Rates per 100,000 Population by State
States Listed in Order of Property Vs. Violent Crime
As Determined by the Second Principal Component

State Prin1 Prin2 Murder Rape Robbery Assault Burglary Larceny Auto_Theft
Mississippi -1.50736 -2.54671 14.3 19.6 65.7 189.1 915.6 1239.9 144.4
South Carolina 1.60336 -2.16211 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1
Alabama -0.04988 -2.09610 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7
Louisiana 1.12020 -2.08327 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7
North Carolina -0.69925 -1.67027 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1
Georgia 0.49041 -1.38079 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9
Arkansas -1.05441 -1.34544 8.8 27.6 83.2 203.4 972.6 1862.1 183.4
Kentucky -1.72691 -1.14663 10.1 19.1 81.1 123.3 872.2 1662.1 245.4
Tennessee -0.13660 -1.13498 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0
New Mexico 1.21417 -0.95076 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5
West Virginia -3.14772 -0.81425 6.0 13.2 42.2 90.9 597.4 1341.7 163.3
Virginia -0.91621 -0.69265 9.0 23.3 92.1 165.7 986.2 2521.2 226.7
Texas 1.39696 -0.68131 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6
Oklahoma -0.32136 -0.62429 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8
Florida 3.11175 -0.60392 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4
Missouri 0.55637 -0.55851 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4
South Dakota -3.17203 -0.25446 2.0 13.5 17.9 155.7 570.5 1704.4 147.5
Nevada 5.26699 -0.25262 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2
Pennsylvania -1.72007 -0.19590 5.6 19.0 130.3 128.0 877.5 1624.1 333.2
Maryland 2.18280 -0.19474 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5
Kansas -0.63407 -0.02804 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3
Idaho -1.43245 -0.00801 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6
Indiana -0.49990 0.00003 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4
Wyoming -1.42463 0.06268 5.4 21.9 39.7 173.9 811.6 2772.2 282.0
Ohio 0.23953 0.09053 7.8 27.3 190.5 181.1 1216.0 2696.8 400.4
Illinois 0.51290 0.09423 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6
California 4.28380 0.14319 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5
Michigan 2.27333 0.15487 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5
Alaska 2.42151 0.16652 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3
Nebraska -2.15071 0.22574 3.9 18.1 64.7 112.7 760.0 2316.1 249.1
Montana -1.66801 0.27099 5.4 16.7 39.2 156.8 804.9 2773.2 309.2
North Dakota -3.96408 0.38767 0.9 9.0 13.3 43.8 446.1 1843.0 144.7
New York 3.45248 0.43289 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8
Maine -1.82631 0.57878 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9
Oregon 1.44900 0.58603 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9
Washington 0.93058 0.73776 4.3 39.6 106.2 224.8 1605.6 3386.9 360.3
Wisconsin -2.50296 0.78083 2.8 12.9 52.2 63.7 846.9 2614.2 220.7
Iowa -2.58156 0.82475 2.3 10.6 41.2 89.8 812.5 2685.1 219.9
New Hampshire -2.46562 0.82503 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4
Arizona 3.01414 0.84495 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5
Colorado 2.50929 0.91660 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1
Utah -1.04996 0.93656 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5
Vermont -2.06433 0.94497 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2
New Jersey 0.21787 0.96421 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5
Minnesota -1.55434 1.05644 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1
Delaware 0.96458 1.29674 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0
Connecticut -0.54133 1.50123 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2
Hawaii 0.82313 1.82392 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4
Rhode Island -0.20156 2.14658 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4
Massachusetts 0.97844 2.63105 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1

Another recommended procedure is to make scatter plots of the first few components. The sorted listings help to identify observations on the plots. The following statements produce Output 52.1.4 through Output 52.1.5:

   title2 'Plot of the First Two Principal Components';
   %plotit(data=Crime_Components, labelvar=State, 
           plotvars=Prin2 Prin1, color=black, colors=blue);
   run;

   title2 'Plot of the First and Third Principal Components';
   %plotit(data=Crime_Components, labelvar=State, 
           plotvars=Prin3 Prin1, color=black, colors=blue);
   run;

Output 52.1.4: Plot of the First Two Principal Components
pcoe4.gif (8016 bytes)

Output 52.1.5: Plot of the First and Third Principal Components
pcoe5.gif (7798 bytes)

It is possible to identify regional trends on the plot of the first two components. Nevada and California are at the extreme right, with high overall crime rates but an average ratio of property crime to violent crime. North and South Dakota are on the extreme left with low overall crime rates. Southeastern states tend to be in the bottom of the plot, with a higher-than-average ratio of violent crime to property crime. New England states tend to be in the upper part of the plot, with a greater-than-average ratio of property crime to violent crime.

The most striking feature of the plot of the first and third principal components is that Massachusetts and New York are outliers on the third component.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.