Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CORRESP Procedure

Example 24.1: Simple Correspondence Analysis of Cars and Their Owners

In this example, PROC CORRESP creates a contingency table from categorical data and performs a simple correspondence analysis. The data are from a sample of individuals who were asked to provide information about themselves and their cars. The questions included origin of the car (American, Japanese, European) and family status (single, married, single and living with children, and married living with children). These data are used again in Example 24.2.

The first steps read the input data and assign formats. PROC CORRESP is used to perform the simple correspondence analysis. The ALL option displays all tables including the contingency table, chi-square information, profiles, and all results of the correspondence analysis. The OUTC= option creates an output coordinate data set. The TABLES statement specifies the row and column categorical variables. The %PLOTIT macro is used to plot the results.

Normally, you only need to tell the %PLOTIT macro the name of the input data set, DATA=Coor, and the type of analysis performed on the data, DATATYPE=CORRESP.

The following statements produce Output 24.1.1:

   title 'Car Owners and Car Origin';

   proc format;
      value Origin  1 = 'American' 2 = 'Japanese' 3 = 'European';
      value Size    1 = 'Small'    2 = 'Medium'   3 = 'Large';
      value Type    1 = 'Family'   2 = 'Sporty'   3 = 'Work';
      value Home    1 = 'Own'      2 = 'Rent';
      value Sex     1 = 'Male'     2 = 'Female';
      value Income  1 = '1 Income' 2 = '2 Incomes';
      value Marital 1 = 'Single with Kids' 2 = 'Married with Kids'
                    3 = 'Single'           4 = 'Married';
      run;

   data Cars;
      missing a;
      input (Origin Size Type Home Income Marital Kids Sex) (1.) @@;
      * Check for End of Line;
      if n(of Origin -- Sex) eq 0 then do; input; return; end;
      marital = 2 * (kids le 0) + marital;
      format Origin Origin. Size Size. Type Type. Home Home.
             Sex Sex. Income Income. Marital Marital.;
      output;
      datalines;
   131112212121110121112201131211011211221122112121131122123211222212212201
   121122023121221232211101122122022121110122112102131112211121110112311101
   211112113211223121122202221122111311123131211102321122223221220221221101
   122122022121220211212201221122021122110132112202213112111331226122221101
   1212110231AA220232112212113112112121220212212202112111022222110212121221
   211211012211222212211101313112113121220121112212121112212211222221112211
   221111011112220122212201131211013121220113112222131112012131110221112211
   121112212211121121112201321122311311221113112212213211013121220221221101
   133211011212220233311102213111023211122121312222212212111111222121112211
   133112011212112212112212212222022131222222121101111122022211220113112212
   211112012232220121221102213211011131220121212201211122112331220233312202
   222122012111220212112201221122112212220222212211311122012111110112212212
   112222011131112221212202322211021222110121221101333211012232110132212101
   223222013111220112211101211211022112110212211102221122021111220112111211
   111122022121110113311122322111122221210222211101212122021211221232112202
   1331110113112211213222012131221211112212221122021331220212121112121.2212
   121122.22121210233112212222121011311122121211102211122112121110121212101
   311212022231221112112211211211312221221213112212221122022222110131212202
   213122211311221212112222113122221221220213111221121211221211221221221102
   131122211211220221222101223112012111221212111102223122111311222121111102
   2121110121112202133122222311122121312212112.2101312122012111122112112202
   111212023121110111112221212111012211220221321101221211122121220112111112
   212211022111110122221101121112112122110122122232221122212211221212112202
   213122112211110212121201113211012221110232111102212211012112220121212202
   221112011211220121221101211211022211221112121101111112212121221111221201
   211122122122111212112221111122312132110113121101121122222111220222121102
   221211012122110221221102312111012122220121121101121122221111222212221102
   212122021222120113112202121122212121110113111101123112212111220113111101
   221112211321210131212211121211011222110122112222123122023121223112212202
   311211012131110131221102112211021131220213122201222111022121221221312202
   131.22523221110122212221131112412211220221121112131222022122220122122201
   212111011311220221312202221122123221210121222202223122121211221221111112
   211111121211221221212201113122122131220222112222211122011311110112312211
   211222013221220121211211312122122221220122112201111222011211110122311112
   312111021231220122121101211112112.22110222112212121122122211110121112101
   121211013211222121112222321112112112110121321101113111012221220121312201
   213211012212220221211101321122121111220221121101122211021122110213112212
   212122011211122131221101121211022212220212121101
   ;

   *---Perform Simple Correspondence Analysis---;
   proc corresp all data=Cars outc=Coor;
      tables Marital, Origin;
      run;

   *---Plot the Simple Correspondence Analysis Results---;
   %plotit(data=Coor, datatype=corresp)

Correspondence analysis locates all the categories in a Euclidean space. The first two dimensions of this space are plotted to examine the associations among the categories. Since the smallest dimension of this table is three, there is no loss of information when only two dimensions are plotted. The plot should be thought of as two different overlaid plots, one for each categorical variable. Distances between points within a variable have meaning, but distances between points from different variables do not.

Output 24.1.1: Simple Correspondence Analysis of a Contingency Table
 
Car Owners and Car Origin

The CORRESP Procedure

Contingency Table
  American European Japanese Sum
Married 37 14 51 102
Married with Kids 52 15 44 111
Single 33 15 63 111
Single with Kids 6 1 8 15
Sum 128 45 166 339
 
Chi-Square Statistic Expected Values
  American European Japanese
Married 38.5133 13.5398 49.9469
Married with Kids 41.9115 14.7345 54.3540
Single 41.9115 14.7345 54.3540
Single with Kids 5.6637 1.9912 7.3451
 
Observed Minus Expected Values
  American European Japanese
Married -1.5133 0.4602 1.0531
Married with Kids 10.0885 0.2655 -10.3540
Single -8.9115 0.2655 8.6460
Single with Kids 0.3363 -0.9912 0.6549
 
Contributions to the Total Chi-Square Statistic
  American European Japanese Sum
Married 0.05946 0.01564 0.02220 0.09730
Married with Kids 2.42840 0.00478 1.97235 4.40553
Single 1.89482 0.00478 1.37531 3.27492
Single with Kids 0.01997 0.49337 0.05839 0.57173
Sum 4.40265 0.51858 3.42825 8.34947

 
Car Owners and Car Origin

The CORRESP Procedure

Row Profiles
  American European Japanese
Married 0.362745 0.137255 0.500000
Married with Kids 0.468468 0.135135 0.396396
Single 0.297297 0.135135 0.567568
Single with Kids 0.400000 0.066667 0.533333
 
Column Profiles
  American European Japanese
Married 0.289063 0.311111 0.307229
Married with Kids 0.406250 0.333333 0.265060
Single 0.257813 0.333333 0.379518
Single with Kids 0.046875 0.022222 0.048193

 
Car Owners and Car Origin

The CORRESP Procedure

Inertia and Chi-Square Decomposition
Singular
Value
Principal
Inertia
Chi-
Square

Percent
Cumulative
Percent
   19   38   57   76   95   
----+----+----+----+----+---
0.15122 0.02287 7.75160 92.84 92.84 ************************    
0.04200 0.00176 0.59787 7.16 100.00 **                          
Total 0.02463 8.34947 100.00                               
Degrees of Freedom = 6
 
Row Coordinates
  Dim1 Dim2
Married -0.0278 0.0134
Married with Kids 0.1991 0.0064
Single -0.1716 0.0076
Single with Kids -0.0144 -0.1947
 
Summary Statistics for the Row Points
  Quality Mass Inertia
Married 1.0000 0.3009 0.0117
Married with Kids 1.0000 0.3274 0.5276
Single 1.0000 0.3274 0.3922
Single with Kids 1.0000 0.0442 0.0685

 
Car Owners and Car Origin

The CORRESP Procedure

Partial Contributions to Inertia for the Row
Points
  Dim1 Dim2
Married 0.0102 0.0306
Married with Kids 0.5678 0.0076
Single 0.4217 0.0108
Single with Kids 0.0004 0.9511
 
Indices of the Coordinates that Contribute Most to
Inertia for the Row Points
  Dim1 Dim2 Best
Married 0 0 2
Married with Kids 1 0 1
Single 1 0 1
Single with Kids 0 2 2
 
Squared Cosines for the Row Points
  Dim1 Dim2
Married 0.8121 0.1879
Married with Kids 0.9990 0.0010
Single 0.9980 0.0020
Single with Kids 0.0054 0.9946

 
Car Owners and Car Origin

The CORRESP Procedure

Column Coordinates
  Dim1 Dim2
American 0.1847 -0.0166
European 0.0013 0.1073
Japanese -0.1428 -0.0163
 
Summary Statistics for the Column
Points
  Quality Mass Inertia
American 1.0000 0.3776 0.5273
European 1.0000 0.1327 0.0621
Japanese 1.0000 0.4897 0.4106

 
Car Owners and Car Origin

The CORRESP Procedure

Partial Contributions to Inertia
for the Column Points
  Dim1 Dim2
American 0.5634 0.0590
European 0.0000 0.8672
Japanese 0.4366 0.0737
 
Indices of the Coordinates that Contribute
Most to Inertia for the Column Points
  Dim1 Dim2 Best
American 1 0 1
European 0 2 2
Japanese 1 0 1
 
Squared Cosines for the Column
Points
  Dim1 Dim2
American 0.9920 0.0080
European 0.0001 0.9999
Japanese 0.9871 0.0129

Output 24.1.2: Plot of Simple Correspondence Analysis of a Contingency Table
crse1g.gif (3869 bytes)

To interpret the plot, start by interpreting the row points separately from the column points. The European point is near and to the left of the centroid, so it makes a relatively small contribution to the chi-square statistic (because it is near the centroid), it contributes almost nothing to the inertia of dimension one (since its coordinate on dimension one has a small absolute value relative to the other column points), and it makes a relatively large contribution to the inertia of dimension two (since its coordinate on dimension two has a large absolute value relative to the other column points). Its squared cosines for dimension one and two, approximately 0 and 1, respectively, indicate that its position is almost completely determined by its location on dimension two. Its quality of display is 1.0, indicating perfect quality, since the table is two-dimensional after the centering. The American and Japanese points are far from the centroid, and they lie along dimension one. They make relatively large contributions to the chi-square statistic and the inertia of dimension one. The horizontal dimension seems to be largely determined by Japanese versus American car ownership.

In the row points, the Married point is near the centroid, and the Single with Kids point has a small coordinate on dimension one that is near zero. The horizontal dimension seems to be largely determined by the Single versus the Married with Kids points. The two interpretations of dimension one show the association with being Married with Kids and owning an American car, and being single and owning a Japanese car. The fact that the Married with Kids point is close to the American point and the fact that the Japanese point is near the Single point should be ignored. Distances between row and column points are not defined. The plot shows that more people who are married with kids than you would expect if the rows and columns were independent drive an American car, and more people who are single than you would expect if the rows and columns were independent drive a Japanese car.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.