Example 53.1: Multidimensional Preference Analysis of Cars Data
This example uses PROC PRINQUAL
to perform a nonmetric
multidimensional preference (MDPREF) analysis (Carroll 1972).
MDPREF analysis is a principal component analysis
of a data matrix with columns that correspond
to people and rows that correspond to objects.
The data are ratings or rankings of each
person's preference for each object.
The data are the transpose of the usual multivariate data matrix.
(In other words, the columns are people instead of
the more typical matrix where rows represent people.)
The final result of an MDPREF analysis is a biplot
(Gabriel 1981) of the resulting preference space. A
biplot displays the judges and objects in a single plot
by projecting them onto the plane in the transformed
variable space that accounts for the most variance.
The data are ratings by 25 judges of their
preference for each of 17 automobiles.
The ratings are made on a 0 to 9 scale, with 0 meaning very weak
preference and 9 meaning very strong preference for the automobile.
These judgments were made in 1980 about that year's products.
There are two additional variables that indicate
the manufacturer and model of the automobile.
This example uses PROC PRINQUAL, PROC FACTOR, and the
%PLOTIT macro.
PROC FACTOR is used before PROC PRINQUAL to perform a
principal component analysis of the raw judgments.
PROC FACTOR is also used immediately after PROC PRINQUAL since
PROC PRINQUAL is a scoring procedure that optimally scores the
data but does not report the principal component analysis.
The %PLOTIT macro produces the biplot.
For information on the %PLOTIT macro, see Appendix B, "Using the %PLOTIT Macro."
The scree plot, in the standard principal component
analysis reported by PROC FACTOR, shows that two
principal components should be retained for further use. (See the
scree plot in
Output 53.1.1 -there is a clear separation
between the first two components and the remaining components.)
There are nine eigenvalues that are precisely zero because there
are nine fewer observations than variables in the data matrix.
PROC PRINQUAL is then used to monotonically transform the
raw judgments to maximize the proportion of variance
accounted for by the first two principal components.
The following statements create the data set and perform
a principal component analysis of the original data.
These statements produce Output 53.1.1.
title 'Preference Ratings for Automobiles Manufactured in 1980';
data CarPref;
input Make $ 1-10 Model $ 12-22 @25 (Judge1-Judge25) (1.);
datalines;
Cadillac Eldorado 8007990491240508971093809
Chevrolet Chevette 0051200423451043003515698
Chevrolet Citation 4053305814161643544747795
Chevrolet Malibu 6027400723121345545668658
Ford Fairmont 2024006715021443530648655
Ford Mustang 5007197705021101850657555
Ford Pinto 0021000303030201500514078
Honda Accord 5956897609699952998975078
Honda Civic 4836709507488852567765075
Lincoln Continental 7008990592230409962091909
Plymouth Gran Fury 7006000434101107333458708
Plymouth Horizon 3005005635461302444675655
Plymouth Volare 4005003614021602754476555
Pontiac Firebird 0107895613201206958265907
Volkswagen Dasher 4858696508877795377895000
Volkswagen Rabbit 4858509709695795487885000
Volvo DL 9989998909999987989919000
;
* Principal Component Analysis of the Original Data;
options ls=80 ps=65;
proc factor data=CarPref nfactors=2 scree;
ods select Eigenvalues ScreePlot;
var Judge1-Judge25;
title3 'Principal Components of Original Data';
run;
Output 53.1.1: Principal Component Analysis of Original Data
Preference Ratings for Automobiles Manufactured in 1980 |
Principal Components of Original Data |
The FACTOR Procedure |
Initial Factor Method: Principal Components |
Eigenvalues of the Correlation Matrix: Total = 25 Average = 1 |
|
Eigenvalue |
Difference |
Proportion |
Cumulative |
1 |
10.8857202 |
5.0349926 |
0.4354 |
0.4354 |
2 |
5.8507276 |
3.8077964 |
0.2340 |
0.6695 |
3 |
2.0429312 |
0.5207808 |
0.0817 |
0.7512 |
4 |
1.5221504 |
0.3078035 |
0.0609 |
0.8121 |
5 |
1.2143469 |
0.2564839 |
0.0486 |
0.8606 |
6 |
0.9578630 |
0.2197345 |
0.0383 |
0.8989 |
7 |
0.7381286 |
0.1497259 |
0.0295 |
0.9285 |
8 |
0.5884027 |
0.2117186 |
0.0235 |
0.9520 |
9 |
0.3766841 |
0.1091250 |
0.0151 |
0.9671 |
10 |
0.2675591 |
0.0773893 |
0.0107 |
0.9778 |
11 |
0.1901698 |
0.0463921 |
0.0076 |
0.9854 |
12 |
0.1437776 |
0.0349382 |
0.0058 |
0.9911 |
13 |
0.1088394 |
0.0607418 |
0.0044 |
0.9955 |
14 |
0.0480977 |
0.0056610 |
0.0019 |
0.9974 |
15 |
0.0424367 |
0.0202714 |
0.0017 |
0.9991 |
16 |
0.0221653 |
0.0221653 |
0.0009 |
1.0000 |
17 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
18 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
19 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
20 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
21 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
22 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
23 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
24 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
25 |
0.0000000 |
|
0.0000 |
1.0000 |
Scree Plot of Eigenvalues
|
|
|
|
|
12 +
|
|
|
| 1
|
|
10 +
|
|
|
|
|
|
8 +
E |
i |
g |
e |
n |
v |
a 6 +
l | 2
u |
e |
s |
|
|
4 +
|
|
|
|
|
|
2 + 3
|
| 4
| 5
| 6 7
| 8
| 9 0 1 2
0 + 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|
|
|
-----+----+----+----+----+----+----+----+----+----+----+----+----+----+----
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Number
|
|
To fit the nonmetric MDPREF model, you can use the PRINQUAL procedure.
The MONOTONE option is specified in the TRANSFORM statement to request a
nonmetric MDPREF analysis; alternatively, you can instead
specify the IDENTITY option for a metric analysis.
Several options are used in the PROC PRINQUAL statement. The option
DATA=CarPref specifies the input data set, OUT=Results
creates an output data set, and N=2 and the default METHOD=MTV
transform the data to better fit a two-component model.
The REPLACE option replaces the original data with the
monotonically transformed data in the OUT= data set.
The MDPREF option standardizes the component scores to
variance one so that the geometry of the biplot is correct, and
it creates two
variables in the OUT= data set named Prin1 and Prin2.
These variables
contain the standardized principal component scores and
structure matrix, which are used to make the biplot.
If the variables in data matrix X are standardized to mean
zero and variance one, and n is the number of rows in X,
then is the principal component model, where
.The W and contain the eigenvectors and eigenvalues of
the correlation matrix of X.
The first two columns of V, the standardized
component scores, and ,which is the structure matrix, are output.
The advantage of creating a biplot based on
principal components is
that coordinates do not depend on the sample size.
The following statements transform the
data and produce Output 53.1.2.
* Transform the Data to Better Fit a Two Component Model;
proc prinqual data=CarPref out=Results n=2 replace mdpref;
id model;
transform monotone(Judge1-Judge25);
title2 'Multidimensional Preference (MDPREF) Analysis';
title3 'Optimal Monotonic Transformation of Preference Data';
run;
Output 53.1.2: Transformation of Automobile Preference Data
Preference Ratings for Automobiles Manufactured in 1980 |
Multidimensional Preference (MDPREF) Analysis |
Optimal Monotonic Transformation of Preference Data |
PRINQUAL MTV Algorithm Iteration History |
Iteration Number |
Average Change |
Maximum Change |
Proportion of Variance |
Criterion Change |
Note |
1 |
0.24994 |
1.28017 |
0.66946 |
|
|
2 |
0.07223 |
0.36958 |
0.80194 |
0.13249 |
|
3 |
0.04522 |
0.29026 |
0.81598 |
0.01404 |
|
4 |
0.03096 |
0.25213 |
0.82178 |
0.00580 |
|
5 |
0.02182 |
0.23045 |
0.82493 |
0.00315 |
|
6 |
0.01602 |
0.19017 |
0.82680 |
0.00187 |
|
7 |
0.01219 |
0.14748 |
0.82793 |
0.00113 |
|
8 |
0.00953 |
0.11031 |
0.82861 |
0.00068 |
|
9 |
0.00737 |
0.06461 |
0.82904 |
0.00043 |
|
10 |
0.00556 |
0.04469 |
0.82930 |
0.00026 |
|
11 |
0.00445 |
0.04087 |
0.82944 |
0.00014 |
|
12 |
0.00381 |
0.03706 |
0.82955 |
0.00011 |
|
13 |
0.00319 |
0.03348 |
0.82965 |
0.00009 |
|
14 |
0.00255 |
0.02999 |
0.82971 |
0.00006 |
|
15 |
0.00213 |
0.02824 |
0.82976 |
0.00005 |
|
16 |
0.00183 |
0.02646 |
0.82980 |
0.00004 |
|
17 |
0.00159 |
0.02472 |
0.82983 |
0.00003 |
|
18 |
0.00139 |
0.02305 |
0.82985 |
0.00003 |
|
19 |
0.00123 |
0.02145 |
0.82988 |
0.00002 |
|
20 |
0.00109 |
0.01993 |
0.82989 |
0.00002 |
|
21 |
0.00096 |
0.01850 |
0.82991 |
0.00001 |
|
22 |
0.00086 |
0.01715 |
0.82992 |
0.00001 |
|
23 |
0.00076 |
0.01588 |
0.82993 |
0.00001 |
|
24 |
0.00067 |
0.01440 |
0.82994 |
0.00001 |
|
25 |
0.00059 |
0.00871 |
0.82994 |
0.00001 |
|
26 |
0.00050 |
0.00720 |
0.82995 |
0.00000 |
|
27 |
0.00043 |
0.00642 |
0.82995 |
0.00000 |
|
28 |
0.00037 |
0.00573 |
0.82995 |
0.00000 |
|
29 |
0.00031 |
0.00510 |
0.82995 |
0.00000 |
|
30 |
0.00027 |
0.00454 |
0.82995 |
0.00000 |
Not Converged |
WARNING: Failed to converge, however criterion change is less than 0.0001. |
|
The iteration history displayed by PROC PRINQUAL
indicates that the proportion of variance is
increased from an initial 0.66946 to 0.82995.
The proportion of variance accounted for by PROC PRINQUAL on the
first iteration equals the cumulative proportion of variance
shown by PROC FACTOR for the first two principal components.
In this example, PROC PRINQUAL's initial iteration performs a
standard principal component analysis of the raw data.
The columns labeled Average Change,
Maximum Change, and Variance Change
contain values that always decrease, indicating that
PROC PRINQUAL is improving the transformations at a
monotonically decreasing rate over the iterations.
This does not always happen, and when it does not, it suggests
that the analysis may be converging to a degenerate solution.
See Example 53.2
for a discussion of a degenerate solution. The algorithm does not
converge in 30 iterations. However, the criterion change is small,
indicating that more iterations are unlikely to have much effect
on the results.
The second PROC FACTOR analysis is
performed on the transformed data.
The WHERE statement is used to retain only
the monotonically transformed judgments.
The scree plot shows that the first two eigenvalues are now
much larger than the remaining smaller eigenvalues.
The second eigenvalue has increased markedly
at the expense of the next several eigenvalues.
Two principal components seem to be necessary
and sufficient to adequately describe these
judges' preferences for these automobiles.
The cumulative proportion of variance displayed by PROC
FACTOR for the first two principal components is 0.83.
The following statements perform the
analysis and produce Output 53.1.3:
* Final Principal Component Analysis;
proc factor data=Results nfactors=2 scree;
ods select Eigenvalues ScreePlot;
var Judge1-Judge25;
where _TYPE_='SCORE';
title3 'Principal Components of Monotonically Transformed Data';
run;
Output 53.1.3: Principal Components of Transformed Data
Preference Ratings for Automobiles Manufactured in 1980 |
Multidimensional Preference (MDPREF) Analysis |
Principal Components of Monotonically Transformed Data |
The FACTOR Procedure |
Initial Factor Method: Principal Components |
Eigenvalues of the Correlation Matrix: Total = 25 Average = 1 |
|
Eigenvalue |
Difference |
Proportion |
Cumulative |
1 |
11.5959045 |
2.4429455 |
0.4638 |
0.4638 |
2 |
9.1529589 |
7.9952554 |
0.3661 |
0.8300 |
3 |
1.1577036 |
0.3072013 |
0.0463 |
0.8763 |
4 |
0.8505023 |
0.1284323 |
0.0340 |
0.9103 |
5 |
0.7220700 |
0.2613540 |
0.0289 |
0.9392 |
6 |
0.4607160 |
0.0958339 |
0.0184 |
0.9576 |
7 |
0.3648821 |
0.0877851 |
0.0146 |
0.9722 |
8 |
0.2770970 |
0.1250945 |
0.0111 |
0.9833 |
9 |
0.1520025 |
0.0506622 |
0.0061 |
0.9894 |
10 |
0.1013403 |
0.0292763 |
0.0041 |
0.9934 |
11 |
0.0720640 |
0.0200979 |
0.0029 |
0.9963 |
12 |
0.0519661 |
0.0336675 |
0.0021 |
0.9984 |
13 |
0.0182987 |
0.0027059 |
0.0007 |
0.9991 |
14 |
0.0155927 |
0.0093669 |
0.0006 |
0.9997 |
15 |
0.0062258 |
0.0055503 |
0.0002 |
1.0000 |
16 |
0.0006755 |
0.0006755 |
0.0000 |
1.0000 |
17 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
18 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
19 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
20 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
21 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
22 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
23 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
24 |
0.0000000 |
0.0000000 |
0.0000 |
1.0000 |
25 |
0.0000000 |
|
0.0000 |
1.0000 |
Scree Plot of Eigenvalues
|
|
|
|
|
12 +
| 1
|
|
|
|
|
10 +
|
|
| 2
|
|
|
8 +
E |
i |
g |
e |
n |
v |
a 6 +
l |
u |
e |
s |
|
|
4 +
|
|
|
|
|
|
2 +
|
|
| 3
| 4 5
| 6
| 7 8 9
0 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|
|
|
|
-----+----+----+----+----+----+----+----+----+----+----+----+----+----+----
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Number
|
|
The remainder of the example constructs the MDPREF biplot.
A biplot is a plot that displays the relation between
the row points and the columns of a data matrix.
The rows of V, the standardized component scores,
and , which is the structure matrix,
contain enough information to reproduce X.
The (i,j) element of X is the product of
row i of V and row j of .If all but the first two columns of V and
are discarded, the (i,j) element
of X is approximated by the product of row i
of V and row j of .Since the MDPREF analysis is based on a principal
component model, the dimensions of the MDPREF
biplot are the first two principal components.
The first principal component is the longest dimension through the
MDPREF biplot. The first principal component is overall preference,
which is the most salient dimension in the preference judgments. One
end points in the direction that is on the average preferred most by the
judges, and the other end points in the least preferred direction. The
second principal component is orthogonal to the first principal
component, and it is the orthogonal direction that is the second most
salient. The interpretation of the second dimension varies from
example to example.
With an MDPREF biplot, it is geometrically appropriate to represent
each automobile (object) by a point and each judge by a vector.
The automobile points have coordinates that are the scores
of the automobile on the first two principal components.
The judge vectors emanate from the origin of the space and
go through a point with coordinates that are the coefficients of
the judge (variable) on the first two principal components.
The absolute length of a vector is arbitrary.
However, the relative lengths of the vectors indicate
fit, with the squared lengths being proportional
to the communalities in the PROC FACTOR output.
The direction of the vector indicates the direction
that is most preferred by the individual judge, with
preference increasing as the vector moves from the origin.
Let v' be row i of V,
u'
be row j of ,|v| be
the length of v, |u| be the length of u,
and be the angle between v and u.
The predicted degree of preference that an
individual judge has for an automobile is
.Each car point can be orthogonally
projected onto the vector.
The projection of car i on vector j is
u((u'v)/(u'u))
and the length of this projection is .The automobile that projects farthest along a vector
in the direction it points is that judge's most preferred
automobile, since the length of this projection, , differs from the predicted preference,
, only by
|u|, which is constant within each judge.
To interpret the biplot, look for directions through the plot that
show a continuous change in some attribute of the automobiles, or
look for regions in the plot that contain clusters of automobile
points and determine what attributes the automobiles have in common.
Those points that are tightly clustered in a
region of the plot represent automobiles that have
the same preference patterns across the judges.
Those vectors that point in roughly the same direction
represent judges who tend to have similar preference patterns.
The following statement constructs the biplot and produces
Output 53.1.4:
title3 'Biplot of Automobiles and Judges';
%plotit(data=results, datatype=mdpref 2);
The DATATYPE=MDPREF 2 option indicates that the coordinates come from
an MDPREF analysis, so the macro represents the scores as points
and the structure as vectors, with the vectors stretched by a factor of
two to make a better graphical display.
Output 53.1.4: Preference Ratings for Automobiles Manufactured in 1980
In the biplot, American automobiles are located
on the left of the space, while European and
Japanese automobiles are located on the right.
At the top of the space are expensive American
automobiles (Cadillac Eldorado, Lincoln Continental)
while at the bottom are inexpensive ones (Pinto, Chevette).
The first principal component differentiates American
from imported automobiles, and the second arranges
automobiles by price and other associated characteristics.
The two expensive American automobiles form a cluster,
the sporty automobile (Firebird) is by itself, the
Volvo DL is by itself, and the remaining imported autos
form a cluster, as do the remaining American autos.
It seems there are 5 prototypical automobiles in this set
of 17, in terms of preference patterns among the 25 judges.
Most of the judges prefer the imported
automobiles, especially the Volvo.
There is also a fairly large minority that prefer the
expensive cars, whether or not they are American (those with
vectors that point towards one o'clock), or simply prefer expensive
American automobiles (vectors that point towards eleven o'clock).
There are two people who prefer anything except expensive
American cars (five o'clock vectors), and one who
prefers inexpensive American cars (seven o'clock vector).
Several vectors point toward the upper-right corner of the plot,
toward a region with no cars. This is the region between the European
and Japanese cars on the right and the luxury cars on the top.
This suggests that there is a market for luxury Japanese and European cars.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.