Example 51.2: Examining Outliers

The PLS Procedure

Example 51.2: Examining Outliers

This example is a continuation of Example 51.1.

A PLS model effectively models both the predictors and the responses. In order to check for outliers, you should, therefore, look at the Euclidean distance from each point to the PLS model in both the standardized predictors and the standardized responses. No point should be dramatically farther from the model than the rest. If there is a group of points that are all farther from the model than the rest, they may have something in common, in which case they should be analyzed separately. The following statements compute and plot these distances to the reduced model, dropping variables L1, L2, P2, P4, S5, L5, and P5:

   proc pls data=ptrain nfac=2 noprint;
      model log_RAI = S1    P1
                      S2
                      S3 L3 P3
                      S4 L4   ;
      output out=stdres stdxsse=stdxsse
                        stdysse=stdysse;
   data stdres; set stdres;
      xdist = sqrt(stdxsse);
      ydist = sqrt(stdysse);
   run;

   symbol1 i=needles v=dot c=blue;
   proc gplot data=stdres;
      plot xdist*n=1 / cframe=ligr;
   proc gplot data=stdres;
      plot ydist*n=1 / cframe=ligr;
   run;

The plots are shown in Output 51.2.1 and Output 51.2.2.

Output 51.2.1: Distances from the X-variables to the Model (Training Set)

Output 51.2.2: Distances from the Y-variables to the Model (Training Set)

There appear to be no profound outliers in either the predictor space or the response space.

Chapter Contents
Previous
Next
Top