Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The PLS Procedure

Example 51.3: Choosing a PLS Model by Test Set Validation

The following example demonstrates issues in spectrometric calibration. The data (Umetrics 1995) consist of spectrographic readings on 33 samples containing known concentrations of two amino acids, tyrosine and tryptophan. The spectra are measured at 30 frequencies across the overall range of frequencies. For example, Figure 51.11 shows the observed spectra for three samples, one with only tryptophan, one with only tyrosine, and one with a mixture of the two, all at a total concentration of 10-6.

plsx3a.gif (4421 bytes)

Figure 51.11: Spectra for Three Samples of Tyrosine and Tryptophan

Of the 33 samples, 18 are used as a training set and 15 as a test set. The data originally appear in McAvoy et al. (1989).

These data were created in a lab, with the concentrations fixed in order to provide a wide range of applicability for the model. You want to use a linear function of the logarithms of the spectra to predict the logarithms of tyrosine and tryptophan concentration, as well as the logarithm of the total concentration. Actually, because of the possibility of zeros in both the responses and the predictors, slightly different transformations are used. The following statements create SAS data sets containing the training and test data, named ftrain and ftest, respectively:

   data ftrain;                                                   
      input obsnam $ tot tyr f1-f30 @@;                           
      try = tot - tyr;                                            
      if (tyr) then tyr_log = log10(tyr); else tyr_log = -8;      
      if (try) then try_log = log10(try); else try_log = -8;      
      tot_log = log10(tot);                                       
      datalines;                                                  
   17mix35 0.00003 0                                              
    -6.215 -5.809 -5.114 -3.963 -2.897 -2.269 -1.675 -1.235       
    -0.900 -0.659 -0.497 -0.395 -0.335 -0.315 -0.333 -0.377       
    -0.453 -0.549 -0.658 -0.797 -0.878 -0.954 -1.060 -1.266       
    -1.520 -1.804 -2.044 -2.269 -2.496 -2.714                     
   19mix35 0.00003 3E-7                                           
    -5.516 -5.294 -4.823 -3.858 -2.827 -2.249 -1.683 -1.218       
    -0.907 -0.658 -0.501 -0.400 -0.345 -0.323 -0.342 -0.387       
    -0.461 -0.554 -0.665 -0.803 -0.887 -0.960 -1.072 -1.272       
    -1.541 -1.814 -2.058 -2.289 -2.496 -2.712                     
   21mix35 0.00003 7.5E-7                                         
    -5.519 -5.294 -4.501 -3.863 -2.827 -2.280 -1.716 -1.262       
    -0.939 -0.694 -0.536 -0.444 -0.384 -0.369 -0.377 -0.421       
    -0.495 -0.596 -0.706 -0.824 -0.917 -0.988 -1.103 -1.294       
    -1.565 -1.841 -2.084 -2.320 -2.521 -2.729                     
   23mix35 0.00003 1.5E-6                                         
    -5.294 -4.705 -4.262 -3.605 -2.726 -2.239 -1.681 -1.250       
    -0.925 -0.697 -0.534 -0.437 -0.381 -0.359 -0.369 -0.426       
    -0.499 -0.591 -0.701 -0.843 -0.925 -0.989 -1.109 -1.310       
    -1.579 -1.852 -2.090 -2.316 -2.521 -2.743                     
   25mix35 0.00003 3E-6                                           
    -4.600 -4.069 -3.764 -3.262 -2.598 -2.191 -1.680 -1.273       
    -0.958 -0.729 -0.573 -0.470 -0.422 -0.407 -0.422 -0.468       
    -0.538 -0.639 -0.753 -0.887 -0.968 -1.037 -1.147 -1.357       
    -1.619 -1.886 -2.141 -2.359 -2.585 -2.792                     
   27mix35 0.00003 7.5E-6                                         
    -3.812 -3.376 -3.026 -2.726 -2.249 -1.919 -1.541 -1.198       
    -0.951 -0.764 -0.639 -0.570 -0.528 -0.525 -0.550 -0.606       
    -0.689 -0.781 -0.909 -1.031 -1.126 -1.191 -1.303 -1.503       
    -1.784 -2.058 -2.297 -2.507 -2.727 -2.970                     
   29mix35 0.00003 0.000015                                       
    -3.053 -2.641 -2.382 -2.194 -1.977 -1.913 -1.728 -1.516       
    -1.317 -1.158 -1.029 -0.963 -0.919 -0.915 -0.933 -0.981       
    -1.055 -1.157 -1.271 -1.409 -1.505 -1.546 -1.675 -1.880       
    -2.140 -2.415 -2.655 -2.879 -3.075 -3.319                     
   28mix35 0.00003 0.0000225                                      
    -2.626 -2.248 -2.004 -1.839 -1.742 -1.791 -1.786 -1.772       
    -1.728 -1.666 -1.619 -1.591 -1.575 -1.580 -1.619 -1.671       
    -1.754 -1.857 -1.982 -2.114 -2.210 -2.258 -2.379 -2.570       
    -2.858 -3.117 -3.347 -3.568 -3.764 -4.012                     
   26mix35 0.00003 0.000027                                       
    -2.370 -1.990 -1.754 -1.624 -1.560 -1.655 -1.772 -1.899       
    -1.982 -2.074 -2.157 -2.211 -2.267 -2.317 -2.369 -2.460       
    -2.545 -2.668 -2.807 -2.951 -3.030 -3.075 -3.214 -3.376       
    -3.685 -3.907 -4.129 -4.335 -4.501 -4.599                     
   24mix35 0.00003 0.0000285                                      
    -2.326 -1.952 -1.702 -1.583 -1.507 -1.629 -1.771 -1.945       
    -2.115 -2.297 -2.448 -2.585 -2.696 -2.808 -2.913 -3.030       
    -3.163 -3.265 -3.376 -3.534 -3.642 -3.721 -3.858 -4.012       
    -4.262 -4.501 -4.704 -4.822 -4.956 -5.292                     
   22mix35 0.00003 0.00002925                                     
    -2.277 -1.912 -1.677 -1.556 -1.487 -1.630 -1.791 -1.969       
    -2.203 -2.437 -2.655 -2.844 -3.032 -3.214 -3.378 -3.503       
    -3.646 -3.812 -3.958 -4.129 -4.193 -4.262 -4.415 -4.501       
    -4.823 -5.111 -5.113 -5.294 -5.290 -5.294                     
   20mix35 0.00003 0.0000297                                      
    -2.266 -1.912 -1.688 -1.546 -1.500 -1.640 -1.801 -2.011       
    -2.277 -2.545 -2.823 -3.094 -3.376 -3.572 -3.812 -4.012       
    -4.262 -4.415 -4.501 -4.705 -4.823 -4.823 -4.956 -5.111       
    -5.111 -5.516 -5.524 -5.806 -5.806 -5.806                     
   18mix35 0.00003 0.00003                                        
    -2.258 -1.900 -1.666 -1.524 -1.479 -1.621 -1.803 -2.043       
    -2.308 -2.626 -2.895 -3.214 -3.568 -3.907 -4.193 -4.423       
    -4.825 -5.111 -5.111 -5.516 -5.516 -5.516 -5.516 -5.806       
    -5.806 -5.806 -5.806 -5.806 -6.210 -6.215                     
   trp2    0.0001 0                                               
    -5.922 -5.435 -4.366 -3.149 -2.124 -1.392 -0.780 -0.336       
    -0.002  0.233  0.391  0.490  0.540  0.563  0.541  0.488       
     0.414  0.313  0.203  0.063 -0.028 -0.097 -0.215 -0.411       
    -0.678 -0.953 -1.208 -1.418 -1.651 -1.855                     
   mix5    0.0001 0.00001                                         
    -3.932 -3.411 -2.964 -2.462 -1.836 -1.308 -0.796 -0.390       
    -0.076  0.147  0.294  0.394  0.446  0.460  0.443  0.389       
     0.314  0.220  0.099 -0.033 -0.128 -0.197 -0.308 -0.506       
    -0.785 -1.050 -1.313 -1.529 -1.745 -1.970                     
   mix4    0.0001 0.000025                                        
    -2.996 -2.479 -2.099 -1.803 -1.459 -1.126 -0.761 -0.424       
    -0.144  0.060  0.195  0.288  0.337  0.354  0.330  0.274       
     0.206  0.105 -0.009 -0.148 -0.242 -0.306 -0.424 -0.626       
    -0.892 -1.172 -1.425 -1.633 -1.877 -2.071                     
   mix3    0.0001 0.00005                                         
    -2.128 -1.661 -1.344 -1.160 -0.996 -0.877 -0.696 -0.495       
    -0.313 -0.165 -0.042  0.032  0.069  0.079  0.050 -0.006       
    -0.082 -0.179 -0.295 -0.436 -0.523 -0.584 -0.706 -0.898       
    -1.178 -1.446 -1.696 -1.922 -2.128 -2.350                     
   mix6    0.0001 0.00009                                         
    -1.140 -0.757 -0.497 -0.362 -0.329 -0.412 -0.513 -0.647       
    -0.772 -0.877 -0.958 -1.040 -1.104 -1.162 -1.233 -1.317       
    -1.425 -1.543 -1.661 -1.804 -1.877 -1.959 -2.034 -2.249       
    -2.502 -2.732 -2.964 -3.142 -3.313 -3.576                     
   ;                                                              
                                                                  
   data ftest;                                                    
      input obsnam $ tot tyr f1-f30 @@;                           
      try = tot - tyr;                                            
      if (tyr) then tyr_log = log10(tyr); else tyr_log = -8;      
      if (try) then try_log = log10(try); else try_log = -8;      
      tot_log = log10(tot);                                       
      datalines;                                                  
   43trp6  1E-6 0                                                 
    -5.915 -5.918 -6.908 -5.428 -4.117 -5.103 -4.660 -4.351       
    -4.023 -3.849 -3.634 -3.634 -3.572 -3.513 -3.634 -3.572       
    -3.772 -3.772 -3.844 -3.932 -4.017 -4.023 -4.117 -4.227       
    -4.492 -4.660 -4.855 -5.428 -5.103 -5.428                     
   59mix6  1E-6 1E-7                                              
    -5.903 -5.903 -5.903 -5.082 -4.213 -5.083 -4.838 -4.639       
    -4.474 -4.213 -4.001 -4.098 -4.001 -4.001 -3.907 -4.001       
    -4.098 -4.098 -4.206 -4.098 -4.213 -4.213 -4.335 -4.474       
    -4.639 -4.838 -4.837 -5.085 -5.410 -5.410                     
   51mix6  1E-6 2.5E-7                                            
    -5.907 -5.907 -5.415 -4.843 -4.213 -4.843 -4.843 -4.483       
    -4.343 -4.006 -4.006 -3.912 -3.830 -3.830 -3.755 -3.912       
    -4.006 -4.001 -4.213 -4.213 -4.335 -4.483 -4.483 -4.642       
    -4.841 -5.088 -5.088 -5.415 -5.415 -5.415                     
   49mix6  1E-6 5E-7                                              
    -5.419 -5.091 -5.091 -4.648 -4.006 -4.846 -4.648 -4.483       
    -4.343 -4.220 -4.220 -4.220 -4.110 -4.110 -4.110 -4.220       
    -4.220 -4.343 -4.483 -4.483 -4.650 -4.650 -4.846 -4.846       
    -5.093 -5.091 -5.419 -5.417 -5.417 -5.907                     
   53mix6  1E-6 7.5E-7                                            
    -5.083 -4.837 -4.837 -4.474 -3.826 -4.474 -4.639 -4.838       
    -4.837 -4.639 -4.639 -4.641 -4.641 -4.639 -4.639 -4.837       
    -4.838 -4.838 -5.083 -5.082 -5.083 -5.410 -5.410 -5.408       
    -5.408 -5.900 -5.410 -5.903 -5.900 -6.908                     
   57mix6  1E-6 9E-7                                              
    -5.082 -4.836 -4.639 -4.474 -3.826 -4.636 -4.638 -4.638       
    -4.837 -5.082 -5.082 -5.408 -5.082 -5.080 -5.408 -5.408       
    -5.408 -5.408 -5.408 -5.408 -5.408 -5.900 -5.900 -5.900       
    -5.900 -5.900 -5.900 -5.900 -6.908 -6.908                     
   41tyro6 1E-6 1E-6                                              
    -5.104 -4.662 -4.662 -4.358 -3.705 -4.501 -4.662 -4.859       
    -5.104 -5.431 -5.433 -5.918 -5.918 -5.918 -5.431 -5.918       
    -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -6.908       
    -5.918 -5.918 -6.908 -6.908 -5.918 -5.918                     
   28trp5  0.00001 0                                              
    -5.937 -5.937 -5.937 -4.526 -3.544 -3.170 -2.573 -2.115       
    -1.792 -1.564 -1.400 -1.304 -1.244 -1.213 -1.240 -1.292       
    -1.373 -1.453 -1.571 -1.697 -1.801 -1.873 -2.008 -2.198       
    -2.469 -2.706 -2.990 -3.209 -3.384 -3.601                     
   37mix5  0.00001 1E-6                                           
    -5.109 -4.865 -4.501 -4.029 -3.319 -3.070 -2.569 -2.207       
    -1.895 -1.684 -1.516 -1.423 -1.367 -1.348 -1.374 -1.415       
    -1.503 -1.596 -1.718 -1.839 -1.927 -1.997 -2.118 -2.333       
    -2.567 -2.874 -3.106 -3.313 -3.579 -3.781                     
   33mix5  0.00001 2.5E-6                                         
    -4.366 -4.129 -3.781 -3.467 -3.037 -2.939 -2.593 -2.268       
    -1.988 -1.791 -1.649 -1.565 -1.520 -1.509 -1.524 -1.580       
    -1.665 -1.758 -1.882 -2.037 -2.090 -2.162 -2.284 -2.465       
    -2.761 -3.037 -3.270 -3.520 -3.709 -3.937                     
   31mix5  0.00001 5E-6                                           
    -3.790 -3.373 -3.119 -2.915 -2.671 -2.718 -2.555 -2.398       
    -2.229 -2.085 -1.971 -1.902 -1.860 -1.837 -1.881 -1.949       
    -2.009 -2.127 -2.230 -2.381 -2.455 -2.513 -2.624 -2.827       
    -3.117 -3.373 -3.586 -3.785 -4.040 -4.366                     
   35mix5  0.00001 7.5E-6                                         
    -3.321 -2.970 -2.765 -2.594 -2.446 -2.548 -2.616 -2.617       
    -2.572 -2.550 -2.508 -2.487 -2.488 -2.487 -2.529 -2.593       
    -2.688 -2.792 -2.908 -3.037 -3.149 -3.189 -3.273 -3.467       
    -3.781 -4.029 -4.241 -4.501 -4.669 -4.865                     
   39mix5  0.00001 9E-6                                           
    -3.142 -2.812 -2.564 -2.404 -2.281 -2.502 -2.589 -2.706       
    -2.842 -2.964 -3.068 -3.103 -3.182 -3.268 -3.361 -3.411       
    -3.517 -3.576 -3.705 -3.849 -3.932 -3.932 -4.029 -4.234       
    -4.501 -4.664 -4.860 -5.104 -5.431 -5.433                     
   26tyro5 0.00001 0.00001                                        
    -3.037 -2.696 -2.464 -2.321 -2.239 -2.444 -2.602 -2.823       
    -3.144 -3.396 -3.742 -4.063 -4.398 -4.699 -4.893 -5.138       
    -5.140 -5.461 -5.463 -5.945 -5.461 -5.138 -5.140 -5.138       
    -5.138 -5.463 -5.461 -5.461 -5.461 -5.461                     
   tyro2   0.0001 0.0001                                          
    -1.081 -0.710 -0.470 -0.337 -0.327 -0.433 -0.602 -0.841       
    -1.119 -1.423 -1.750 -2.121 -2.449 -2.818 -3.110 -3.467       
    -3.781 -4.029 -4.241 -4.366 -4.501 -4.366 -4.501 -4.501       
    -4.668 -4.668 -4.865 -4.865 -5.109 -5.111                     
   ;

The following statements fit a PLS model with 10 factors.

   proc pls data=ftrain nfac=10;
      model tot_log tyr_log try_log = f1-f30;
   run;

The table shown in Output 51.3.1 indicates that only three or four factors are required to explain almost all of the variation in both the predictors and the responses.

Output 51.3.1: Amount of Training Set Variation Explained

The PLS Procedure

Percent Variation Accounted for by Partial
Least Squares Factors
Number of
Extracted
Factors
Model Effects Dependent Variables
Current Total Current Total
1 81.1654 81.1654 48.3385 48.3385
2 16.8113 97.9768 32.5465 80.8851
3 1.7639 99.7407 11.4438 92.3289
4 0.1951 99.9357 3.8363 96.1652
5 0.0276 99.9634 1.6880 97.8532
6 0.0132 99.9765 0.7247 98.5779
7 0.0052 99.9817 0.2926 98.8705
8 0.0053 99.9870 0.1252 98.9956
9 0.0049 99.9918 0.1067 99.1023
10 0.0034 99.9952 0.1684 99.2707


In order to choose the optimal number of PLS factors, you can explore how well models based on the training data with different numbers of factors fit the test data. To do so, use the CV=TESTSET option, with an argument pointing to the test data set ftest, as in the following statements:

   proc pls data=ftrain nfac=10 cv=testset(ftest)
                                cvtest(stat=press seed=12345);
      model tot_log tyr_log try_log = f1-f30;
   run;

The results of the test set validation are shown in Output 51.3.2. They indicate that, although five PLS factors give the minimum predicted residual sum of squares, the residuals for four factors are insignificantly different from those for five. Thus, the smaller model is preferred.

Output 51.3.2: Test Set Validation for the Number of PLS Factors

The PLS Procedure

Test Set Validation for
the Number of Extracted
Factors
Number of
Extracted
Factors
Root Mean PRESS Prob > PRESS
0 3.056797 <.0001
1 2.630561 <.0001
2 1.00706 0.0070
3 0.664603 0.0020
4 0.521578 0.3800
5 0.500034 1.0000
6 0.513561 0.5100
7 0.501431 0.6870
8 1.055791 0.1530
9 1.435085 0.1010
10 1.720389 0.0320

Minimum root mean PRESS 0.5000
Minimizing number of factors 5
Smallest number of factors with p > 0.1 4

 


The PLS Procedure

Percent Variation Accounted for by Partial
Least Squares Factors
Number of
Extracted
Factors
Model Effects Dependent Variables
Current Total Current Total
1 81.1654 81.1654 48.3385 48.3385
2 16.8113 97.9768 32.5465 80.8851
3 1.7639 99.7407 11.4438 92.3289
4 0.1951 99.9357 3.8363 96.1652


The factor loadings show how the PLS factors are constructed from the centered and scaled predictors. For spectral calibration, it is useful to plot the loadings against the frequency. In many cases, the physical meanings that can be attached to factor loadings help to validate the scientific interpretation of the PLS model. You can use the following statements to plot the loadings for the four PLS factors against frequency.
   ods listing close;
   ods output XLoadings=xloadings;
   proc pls data=ftrain nfac=4 details method=pls;
      model tot_log tyr_log try_log = f1-f30;
   run;
   ods listing;
   proc transpose data=xloadings(drop=NumberOfFactors)
                  out =xloadings;


   data xloadings; set xloadings;
      n = _n_;
      rename col1=Factor1 col2=Factor2
             col3=Factor3 col4=Factor4;
   run;
   goptions border;
   axis1 label=("Loading"  ) major=(number=5) minor=none;
   axis2 label=("Frequency")                  minor=none;
   symbol1 v=none i=join c=red    l=1;
   symbol2 v=none i=join c=green  l=1 /*l= 3*/;
   symbol3 v=none i=join c=blue   l=1 /*l=34*/;
   symbol4 v=none i=join c=yellow l=1 /*l=46*/;
   legend1 label=none cborder=black;
   proc gplot data=xloadings;
      plot (Factor1 Factor2 Factor3 Factor4)*n
         / overlay legend=legend1 vaxis=axis1
           haxis=axis2 vref=0 lvref=2 frame cframe=ligr;
   run; quit;

The resulting plot is shown in Output 51.3.3.

Output 51.3.3: Predictor Loadings Across Frequencies
plsx3d.gif (4392 bytes)

Notice that all four factors handle frequencies below and above about 7 or 8 differently. For example, the first factor is very nearly a simple contrast between the averages of the two sets of frequencies, and the second factor appears to be a weighted sum of only the frequencies in the first set.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.