Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The DISCRIM Procedure

Example 25.2: Bivariate Density Estimates and Posterior Probabilities

In this example, four more discriminant analyses of iris data are run with two quantitative variables: petal width and petal length. The example produces Output 25.2.1 through Output 25.2.5. A scatter plot shows the joint sample distribution. See Appendix B, "Using the %PLOTIT Macro," for more information on the %PLOTIT macro.

   %plotit(data=iris, plotvars=PetalWidth PetalLength,
           labelvar=_blank_, symvar=symbol, typevar=symbol,
           symsize=0.35, symlen=4, exttypes=symbol, ls=100);

Output 25.2.1: Joint Sample Distribution of Petal Width and Petal Length in Three Species
disx2a.gif (5664 bytes)

Another data set is created for plotting, containing a grid of points suitable for contour plots. The large number of points in the grid makes the following analyses very time-consuming. If you attempt to duplicate these examples, begin with a small number of points in the grid.

   data plotdata;
      do PetalLength=-2 to 72 by 0.25;
         h + 1;    * Number of horizontal cells;
         do PetalWidth=-5 to 32 by 0.25;
            n + 1; * Total number of cells;
            output;
         end;
      end;
      * Make variables to contain H and V grid sizes;
      call symput('hnobs', compress(put(h    , best12.)));
      call symput('vnobs', compress(put(n / h, best12.))); 
      drop n h;
   run;

A macro CONTOUR is defined to make contour plots of density estimates and posterior probabilities. Classification results are also plotted on the same grid.

   %macro contour;
      data contour(keep=PetalWidth PetalLength symbol density);
         set plotd(in=d) iris;
         if d then density = max(setosa,versicolor,virginica);
      run;
      
      title3 'Plot of Estimated Densities';
      %plotit(data=contour, plotvars=PetalWidth PetalLength,
              labelvar=_blank_, symvar=symbol, typevar=symbol,
              symlen=4, exttypes=symbol contour, ls=100,
              paint=density white black, rgbtypes=contour,
              hnobs=&hnobs, vnobs=&vnobs, excolors=white,
              rgbround=-16 1 1 1,  extend=close, options=noclip,
              types  =Setosa Versicolor Virginica  '',
              symtype=symbol symbol     symbol     contour,
              symsize=0.6    0.6        0.6        1,
              symfont=swiss  swiss      swiss      solid)

      data posterior(keep=PetalWidth PetalLength symbol 
           prob _into_);
         set plotp(in=d) iris;
         if d then prob = max(setosa,versicolor,virginica);
      run;

      title3 'Plot of Posterior Probabilities '
             '(Black to White is Low to High Probability)';
      %plotit(data=posterior, plotvars=PetalWidth PetalLength,
              labelvar=_blank_, symvar=symbol, typevar=symbol,
              symlen=4, exttypes=symbol contour, ls=100,
              paint=prob black white 0.3 0.999, rgbtypes=contour,
              hnobs=&hnobs, vnobs=&vnobs,  excolors=white,
              rgbround=-16 1 1 1, extend=close, options=noclip,
              types  =Setosa Versicolor Virginica  '',
              symtype=symbol symbol     symbol     contour,
              symsize=0.6    0.6        0.6        1,
              symfont=swiss  swiss      swiss      solid)

      title3 'Plot of Classification Results';
      %plotit(data=posterior, plotvars=PetalWidth PetalLength,
              labelvar=_blank_, symvar=symbol, typevar=symbol,
              symlen=4, exttypes=symbol contour, ls=100,
              paint=_into_ CXCCCCCC CXDDDDDD white, 
                 rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs, 
                 excolors=white,
              extend=close, options=noclip,
              types  =Setosa Versicolor Virginica  '',
              symtype=symbol symbol     symbol     contour,
              symsize=0.6    0.6        0.6        1,
              symfont=swiss  swiss      swiss      solid)

   %mend;

A normal-theory analysis (METHOD=NORMAL) assuming equal covariance matrices (POOL=YES) illustrates the linearity of the classification boundaries. These statements produce Output 25.2.2:

   proc discrim data=iris method=normal pool=yes 
                testdata=plotdata testout=plotp testoutd=plotd 
                short noclassify crosslisterr;
      class Species;
      var Petal:;
      title2 'Using Normal Density Estimates with Equal 
              Variance';
   run;
   %contour

Output 25.2.2: Normal Density Estimates with Equal Variance

Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure

Observations 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Linear Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.8453 0.1547
9 Versicolor Virginica * 0.0000 0.2130 0.7870
25 Virginica Versicolor * 0.0000 0.8322 0.1678
57 Virginica Versicolor * 0.0000 0.8057 0.1943
91 Virginica Versicolor * 0.0000 0.8903 0.1097
148 Versicolor Virginica * 0.0000 0.3118 0.6882

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Linear Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa 50
100.00
0
0.00
0
0.00
50
100.00
Versicolor 0
0.00
48
96.00
2
4.00
50
100.00
Virginica 0
0.00
4
8.00
46
92.00
50
100.00
Total 50
33.33
52
34.67
48
32.00
150
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0800 0.0400
Priors 0.3333 0.3333 0.3333  


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Linear Discriminant Function

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total 14507
32.78
16888
38.16
12858
29.06
44253
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 


disx2f1.gif (6997 bytes)

disx2f2.gif (9269 bytes)

disx2f3.gif (5031 bytes)

A normal-theory analysis assuming unequal covariance matrices (POOL=NO) illustrates quadratic classification boundaries. These statements produce Output 25.2.3:

   proc discrim data=iris method=normal pool=no 
                testdata=plotdata testout=plotp testoutd=plotd
                short noclassify crosslisterr;
      class Species;
      var Petal:;
      title2 'Using Normal Density Estimates with Unequal 
              Variance';
   run;
   %contour

Output 25.2.3: Normal Density Estimates with Unequal Variance

Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure

Observations 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.7288 0.2712
9 Versicolor Virginica * 0.0000 0.0903 0.9097
25 Virginica Versicolor * 0.0000 0.5196 0.4804
91 Virginica Versicolor * 0.0000 0.8335 0.1665
148 Versicolor Virginica * 0.0000 0.4675 0.5325

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa 50
100.00
0
0.00
0
0.00
50
100.00
Versicolor 0
0.00
48
96.00
2
4.00
50
100.00
Virginica 0
0.00
3
6.00
47
94.00
50
100.00
Total 50
33.33
51
34.00
49
32.67
150
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0600 0.0333
Priors 0.3333 0.3333 0.3333  


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total 5461
12.34
5354
12.10
33438
75.56
44253
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 


disx2k1.gif (4969 bytes)

disx2k2.gif (7770 bytes)

disx2k3.gif (4698 bytes)

A nonparametric analysis (METHOD=NPAR) follows, using normal kernels (KERNEL=NORMAL) and equal bandwidths (POOL=YES) in each class. The value of the radius parameter r that, assuming normality, minimizes an approximate mean integrated square error is 0.50 (see the "Nonparametric Methods" section). These statements produce Output 25.2.4:

   proc discrim data=iris method=npar kernel=normal 
                   r=.5 pool=yes 
                testdata=plotdata testout=plotp 
                   testoutd=plotd 
                short noclassify crosslisterr;
      class Species;
      var Petal:;
      title2 'Using Kernel Density Estimates with Equal 
              Bandwidth';
   run;
   %contour

Output 25.2.4: Kernel Density Estimates with Equal Bandwidth

Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure

Observations 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Normal Kernel Density

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.7474 0.2526
9 Versicolor Virginica * 0.0000 0.0800 0.9200
25 Virginica Versicolor * 0.0000 0.5863 0.4137
91 Virginica Versicolor * 0.0000 0.8358 0.1642
148 Versicolor Virginica * 0.0000 0.4123 0.5877

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Normal Kernel Density

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa 50
100.00
0
0.00
0
0.00
50
100.00
Versicolor 0
0.00
48
96.00
2
4.00
50
100.00
Virginica 0
0.00
3
6.00
47
94.00
50
100.00
Total 50
33.33
51
34.00
49
32.67
150
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0600 0.0333
Priors 0.3333 0.3333 0.3333  


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Normal Kernel Density

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total 12631
28.54
9941
22.46
21681
48.99
44253
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 


disx2q1.gif (5628 bytes)

disx2q2.gif (8171 bytes)

disx2q3.gif (4872 bytes)

Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 25.2.5:

   proc discrim data=iris method=npar kernel=normal 
                   r=.5 pool=no
                testdata=plotdata testout=plotp 
                   testoutd=plotd 
                short noclassify crosslisterr;
      class Species;
      var Petal:;
      title2 'Using Kernel Density Estimates with Unequal 
              Bandwidth';
   run;
   %contour

Output 25.2.5: Kernel Density Estimates with Unequal Bandwidth

Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure

Observations 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Normal Kernel Density

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.7826 0.2174
9 Versicolor Virginica * 0.0000 0.0506 0.9494
91 Virginica Versicolor * 0.0000 0.8802 0.1198
148 Versicolor Virginica * 0.0000 0.3726 0.6274

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Normal Kernel Density

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa 50
100.00
0
0.00
0
0.00
50
100.00
Versicolor 0
0.00
48
96.00
2
4.00
50
100.00
Virginica 0
0.00
2
4.00
48
96.00
50
100.00
Total 50
33.33
50
33.33
50
33.33
150
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0400 0.0267
Priors 0.3333 0.3333 0.3333  


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Normal Kernel Density

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total 5447
12.31
5984
13.52
32822
74.17
44253
100.00
Priors 0.33333
 
0.33333
 
0.33333
 
 
 


disx2v1.gif (4803 bytes)

disx2v2.gif (7267 bytes)

disx2v3.gif (4757 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.