Example 25.2: Bivariate Density Estimates and Posterior Probabilities
In this example, four more discriminant analyses of iris
data are run with two quantitative variables: petal width
and petal length. The example produces Output 25.2.1
through Output 25.2.5.
A scatter plot shows the joint sample distribution. See
Appendix B, "Using the %PLOTIT Macro," for more information on the %PLOTIT macro.
%plotit(data=iris, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symsize=0.35, symlen=4, exttypes=symbol, ls=100);
Output 25.2.1: Joint Sample Distribution of Petal Width
and Petal Length in Three Species
Another data set is created for plotting, containing
a grid of points suitable for contour plots.
The large number of points in the grid makes
the following analyses very time-consuming.
If you attempt to duplicate these examples, begin
with a small number of points in the grid.
data plotdata;
do PetalLength=-2 to 72 by 0.25;
h + 1; * Number of horizontal cells;
do PetalWidth=-5 to 32 by 0.25;
n + 1; * Total number of cells;
output;
end;
end;
* Make variables to contain H and V grid sizes;
call symput('hnobs', compress(put(h , best12.)));
call symput('vnobs', compress(put(n / h, best12.)));
drop n h;
run;
A macro CONTOUR is defined to make contour plots
of density estimates and posterior probabilities.
Classification results are also plotted on the same grid.
%macro contour;
data contour(keep=PetalWidth PetalLength symbol density);
set plotd(in=d) iris;
if d then density = max(setosa,versicolor,virginica);
run;
title3 'Plot of Estimated Densities';
%plotit(data=contour, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symlen=4, exttypes=symbol contour, ls=100,
paint=density white black, rgbtypes=contour,
hnobs=&hnobs, vnobs=&vnobs, excolors=white,
rgbround=-16 1 1 1, extend=close, options=noclip,
types =Setosa Versicolor Virginica '',
symtype=symbol symbol symbol contour,
symsize=0.6 0.6 0.6 1,
symfont=swiss swiss swiss solid)
data posterior(keep=PetalWidth PetalLength symbol
prob _into_);
set plotp(in=d) iris;
if d then prob = max(setosa,versicolor,virginica);
run;
title3 'Plot of Posterior Probabilities '
'(Black to White is Low to High Probability)';
%plotit(data=posterior, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symlen=4, exttypes=symbol contour, ls=100,
paint=prob black white 0.3 0.999, rgbtypes=contour,
hnobs=&hnobs, vnobs=&vnobs, excolors=white,
rgbround=-16 1 1 1, extend=close, options=noclip,
types =Setosa Versicolor Virginica '',
symtype=symbol symbol symbol contour,
symsize=0.6 0.6 0.6 1,
symfont=swiss swiss swiss solid)
title3 'Plot of Classification Results';
%plotit(data=posterior, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symlen=4, exttypes=symbol contour, ls=100,
paint=_into_ CXCCCCCC CXDDDDDD white,
rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs,
excolors=white,
extend=close, options=noclip,
types =Setosa Versicolor Virginica '',
symtype=symbol symbol symbol contour,
symsize=0.6 0.6 0.6 1,
symfont=swiss swiss swiss solid)
%mend;
A normal-theory analysis (METHOD=NORMAL) assuming
equal covariance matrices (POOL=YES) illustrates
the linearity of the classification boundaries.
These statements produce Output 25.2.2:
proc discrim data=iris method=normal pool=yes
testdata=plotdata testout=plotp testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Normal Density Estimates with Equal
Variance';
run;
%contour
Output 25.2.2: Normal Density Estimates with Equal Variance
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Equal Variance |
Observations |
150 |
DF Total |
149 |
Variables |
2 |
DF Within Classes |
147 |
Classes |
3 |
DF Between Classes |
2 |
Class Level Information |
Species |
Variable Name |
Frequency |
Weight |
Proportion |
Prior Probability |
Setosa |
Setosa |
50 |
50.0000 |
0.333333 |
0.333333 |
Versicolor |
Versicolor |
50 |
50.0000 |
0.333333 |
0.333333 |
Virginica |
Virginica |
50 |
50.0000 |
0.333333 |
0.333333 |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Equal Variance |
The DISCRIM Procedure |
Classification Results for Calibration Data: WORK.IRIS |
Cross-validation Results using Linear Discriminant Function |
Posterior Probability of Membership in Species |
Obs |
From Species |
Classified into Species |
Setosa |
Versicolor |
Virginica |
5 |
Virginica |
Versicolor |
* |
0.0000 |
0.8453 |
0.1547 |
9 |
Versicolor |
Virginica |
* |
0.0000 |
0.2130 |
0.7870 |
25 |
Virginica |
Versicolor |
* |
0.0000 |
0.8322 |
0.1678 |
57 |
Virginica |
Versicolor |
* |
0.0000 |
0.8057 |
0.1943 |
91 |
Virginica |
Versicolor |
* |
0.0000 |
0.8903 |
0.1097 |
148 |
Versicolor |
Virginica |
* |
0.0000 |
0.3118 |
0.6882 |
* Misclassified observation |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Equal Variance |
The DISCRIM Procedure |
Classification Summary for Calibration Data: WORK.IRIS |
Cross-validation Summary using Linear Discriminant Function |
Number of Observations and Percent Classified into Species |
From Species |
Setosa |
Versicolor |
Virginica |
Total |
Setosa |
50
100.00 |
0
0.00 |
0
0.00 |
50
100.00 |
Versicolor |
0
0.00 |
48
96.00 |
2
4.00 |
50
100.00 |
Virginica |
0
0.00 |
4
8.00 |
46
92.00 |
50
100.00 |
Total |
50
33.33 |
52
34.67 |
48
32.00 |
150
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
Error Count Estimates for Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Rate |
0.0000 |
0.0400 |
0.0800 |
0.0400 |
Priors |
0.3333 |
0.3333 |
0.3333 |
|
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Equal Variance |
The DISCRIM Procedure |
Classification Summary for Test Data: WORK.PLOTDATA |
Classification Summary using Linear Discriminant Function |
Number of Observations and Percent Classified into Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Total |
14507
32.78 |
16888
38.16 |
12858
29.06 |
44253
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
|
A normal-theory analysis assuming unequal covariance matrices
(POOL=NO) illustrates quadratic classification boundaries.
These statements produce Output 25.2.3:
proc discrim data=iris method=normal pool=no
testdata=plotdata testout=plotp testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Normal Density Estimates with Unequal
Variance';
run;
%contour
Output 25.2.3: Normal Density Estimates with Unequal Variance
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Unequal Variance |
Observations |
150 |
DF Total |
149 |
Variables |
2 |
DF Within Classes |
147 |
Classes |
3 |
DF Between Classes |
2 |
Class Level Information |
Species |
Variable Name |
Frequency |
Weight |
Proportion |
Prior Probability |
Setosa |
Setosa |
50 |
50.0000 |
0.333333 |
0.333333 |
Versicolor |
Versicolor |
50 |
50.0000 |
0.333333 |
0.333333 |
Virginica |
Virginica |
50 |
50.0000 |
0.333333 |
0.333333 |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Unequal Variance |
The DISCRIM Procedure |
Classification Results for Calibration Data: WORK.IRIS |
Cross-validation Results using Quadratic Discriminant Function |
Posterior Probability of Membership in Species |
Obs |
From Species |
Classified into Species |
Setosa |
Versicolor |
Virginica |
5 |
Virginica |
Versicolor |
* |
0.0000 |
0.7288 |
0.2712 |
9 |
Versicolor |
Virginica |
* |
0.0000 |
0.0903 |
0.9097 |
25 |
Virginica |
Versicolor |
* |
0.0000 |
0.5196 |
0.4804 |
91 |
Virginica |
Versicolor |
* |
0.0000 |
0.8335 |
0.1665 |
148 |
Versicolor |
Virginica |
* |
0.0000 |
0.4675 |
0.5325 |
* Misclassified observation |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Unequal Variance |
The DISCRIM Procedure |
Classification Summary for Calibration Data: WORK.IRIS |
Cross-validation Summary using Quadratic Discriminant Function |
Number of Observations and Percent Classified into Species |
From Species |
Setosa |
Versicolor |
Virginica |
Total |
Setosa |
50
100.00 |
0
0.00 |
0
0.00 |
50
100.00 |
Versicolor |
0
0.00 |
48
96.00 |
2
4.00 |
50
100.00 |
Virginica |
0
0.00 |
3
6.00 |
47
94.00 |
50
100.00 |
Total |
50
33.33 |
51
34.00 |
49
32.67 |
150
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
Error Count Estimates for Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Rate |
0.0000 |
0.0400 |
0.0600 |
0.0333 |
Priors |
0.3333 |
0.3333 |
0.3333 |
|
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Normal Density Estimates with Unequal Variance |
The DISCRIM Procedure |
Classification Summary for Test Data: WORK.PLOTDATA |
Classification Summary using Quadratic Discriminant Function |
Number of Observations and Percent Classified into Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Total |
5461
12.34 |
5354
12.10 |
33438
75.56 |
44253
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
|
A nonparametric analysis (METHOD=NPAR) follows, using normal kernels
(KERNEL=NORMAL) and equal bandwidths (POOL=YES) in each class. The
value of the radius parameter r that, assuming normality, minimizes
an approximate mean integrated square error is 0.50 (see
the "Nonparametric Methods" section).
These statements produce Output 25.2.4:
proc discrim data=iris method=npar kernel=normal
r=.5 pool=yes
testdata=plotdata testout=plotp
testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Kernel Density Estimates with Equal
Bandwidth';
run;
%contour
Output 25.2.4: Kernel Density Estimates with Equal Bandwidth
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Equal Bandwidth |
Observations |
150 |
DF Total |
149 |
Variables |
2 |
DF Within Classes |
147 |
Classes |
3 |
DF Between Classes |
2 |
Class Level Information |
Species |
Variable Name |
Frequency |
Weight |
Proportion |
Prior Probability |
Setosa |
Setosa |
50 |
50.0000 |
0.333333 |
0.333333 |
Versicolor |
Versicolor |
50 |
50.0000 |
0.333333 |
0.333333 |
Virginica |
Virginica |
50 |
50.0000 |
0.333333 |
0.333333 |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Equal Bandwidth |
The DISCRIM Procedure |
Classification Results for Calibration Data: WORK.IRIS |
Cross-validation Results using Normal Kernel Density |
Posterior Probability of Membership in Species |
Obs |
From Species |
Classified into Species |
Setosa |
Versicolor |
Virginica |
5 |
Virginica |
Versicolor |
* |
0.0000 |
0.7474 |
0.2526 |
9 |
Versicolor |
Virginica |
* |
0.0000 |
0.0800 |
0.9200 |
25 |
Virginica |
Versicolor |
* |
0.0000 |
0.5863 |
0.4137 |
91 |
Virginica |
Versicolor |
* |
0.0000 |
0.8358 |
0.1642 |
148 |
Versicolor |
Virginica |
* |
0.0000 |
0.4123 |
0.5877 |
* Misclassified observation |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Equal Bandwidth |
The DISCRIM Procedure |
Classification Summary for Calibration Data: WORK.IRIS |
Cross-validation Summary using Normal Kernel Density |
Number of Observations and Percent Classified into Species |
From Species |
Setosa |
Versicolor |
Virginica |
Total |
Setosa |
50
100.00 |
0
0.00 |
0
0.00 |
50
100.00 |
Versicolor |
0
0.00 |
48
96.00 |
2
4.00 |
50
100.00 |
Virginica |
0
0.00 |
3
6.00 |
47
94.00 |
50
100.00 |
Total |
50
33.33 |
51
34.00 |
49
32.67 |
150
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
Error Count Estimates for Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Rate |
0.0000 |
0.0400 |
0.0600 |
0.0333 |
Priors |
0.3333 |
0.3333 |
0.3333 |
|
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Equal Bandwidth |
The DISCRIM Procedure |
Classification Summary for Test Data: WORK.PLOTDATA |
Classification Summary using Normal Kernel Density |
Number of Observations and Percent Classified into Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Total |
12631
28.54 |
9941
22.46 |
21681
48.99 |
44253
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
|
Another nonparametric analysis is run
with unequal bandwidths (POOL=NO).
These statements produce Output 25.2.5:
proc discrim data=iris method=npar kernel=normal
r=.5 pool=no
testdata=plotdata testout=plotp
testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Kernel Density Estimates with Unequal
Bandwidth';
run;
%contour
Output 25.2.5: Kernel Density Estimates with Unequal Bandwidth
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Unequal Bandwidth |
Observations |
150 |
DF Total |
149 |
Variables |
2 |
DF Within Classes |
147 |
Classes |
3 |
DF Between Classes |
2 |
Class Level Information |
Species |
Variable Name |
Frequency |
Weight |
Proportion |
Prior Probability |
Setosa |
Setosa |
50 |
50.0000 |
0.333333 |
0.333333 |
Versicolor |
Versicolor |
50 |
50.0000 |
0.333333 |
0.333333 |
Virginica |
Virginica |
50 |
50.0000 |
0.333333 |
0.333333 |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Unequal Bandwidth |
The DISCRIM Procedure |
Classification Results for Calibration Data: WORK.IRIS |
Cross-validation Results using Normal Kernel Density |
Posterior Probability of Membership in Species |
Obs |
From Species |
Classified into Species |
Setosa |
Versicolor |
Virginica |
5 |
Virginica |
Versicolor |
* |
0.0000 |
0.7826 |
0.2174 |
9 |
Versicolor |
Virginica |
* |
0.0000 |
0.0506 |
0.9494 |
91 |
Virginica |
Versicolor |
* |
0.0000 |
0.8802 |
0.1198 |
148 |
Versicolor |
Virginica |
* |
0.0000 |
0.3726 |
0.6274 |
* Misclassified observation |
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Unequal Bandwidth |
The DISCRIM Procedure |
Classification Summary for Calibration Data: WORK.IRIS |
Cross-validation Summary using Normal Kernel Density |
Number of Observations and Percent Classified into Species |
From Species |
Setosa |
Versicolor |
Virginica |
Total |
Setosa |
50
100.00 |
0
0.00 |
0
0.00 |
50
100.00 |
Versicolor |
0
0.00 |
48
96.00 |
2
4.00 |
50
100.00 |
Virginica |
0
0.00 |
2
4.00 |
48
96.00 |
50
100.00 |
Total |
50
33.33 |
50
33.33 |
50
33.33 |
150
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
Error Count Estimates for Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Rate |
0.0000 |
0.0400 |
0.0400 |
0.0267 |
Priors |
0.3333 |
0.3333 |
0.3333 |
|
|
Discriminant Analysis of Fisher (1936) Iris Data |
Using Kernel Density Estimates with Unequal Bandwidth |
The DISCRIM Procedure |
Classification Summary for Test Data: WORK.PLOTDATA |
Classification Summary using Normal Kernel Density |
Number of Observations and Percent Classified into Species |
|
Setosa |
Versicolor |
Virginica |
Total |
Total |
5447
12.31 |
5984
13.52 |
32822
74.17 |
44253
100.00 |
Priors |
0.33333
|
0.33333
|
0.33333
|
|
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.