The PRINQUAL Procedure

Getting Started

In the following example, PROC PRINQUAL uses the MTV method. Suppose that the problem is to linearize a curve through three-dimensional space. Let

${\rm X}_1 & = & {\rm X}^3 \{\rm X}_2 & = & {\rm X}_1 - {\rm X}^5 \{\rm X}_3 & = & {\rm X}_2 - {\rm X}^6$

where X = -1.00, -0.98, -0.96, ... , 1.00.

These three variables define a curve in three-dimensional space. The GPLOT procedure is used to display two-dimensional views of this curve. These data are completely described by three linear components, but they define a single curve, which could be described as a single nonlinear component.

PROC PRINQUAL is used to attempt to straighten the curve into a one-dimensional line with a continuous transformation of each variable. The N=1 option in the PROC PRINQUAL statement requests one principal component. The TRANSFORM statement requests a cubic spline transformation with nine knots. Splines are curves, which are usually required to be continuous and smooth. Splines are usually defined as piecewise polynomials of degree n with function values and first n-1 derivatives that agree at the points where they join. The abscissa values of the join points are called knots. The term "spline" is also used for polynomials (splines with no knots) and piecewise polynomials with more than one discontinuous derivative. Splines with no knots are generally smoother than splines with knots, which are generally smoother than splines with multiple discontinuous derivatives. Splines with few knots are generally smoother than splines with many knots; however, increasing the number of knots usually increases the fit of the spline function to the data. Knots give the curve freedom to bend to more closely follow the data. Refer to Smith (1979) for an excellent introduction to splines. For another example of using splines, see Example 65.1 in Chapter 65, "The TRANSREG Procedure."

One component accounts for 71 percent of the variance of the untransformed data, and after 50 iterations, over 98 percent of the variance of the transformed data is accounted for by one component (see Figure 53.2). The algorithm did not converge with 50 iterations, so more iterations may be needed for this problem.

PROC PRINQUAL creates an output data set (which is not displayed) that contains both the original and transformed variables. The original variables have the names X1, X2, and X3. Transformed variables are named TX1, TX2, and TX3. All observations in the output data set have _TYPE_='SCORE', since the CORRELATIONS option is not specified in the PROC PRINQUAL statement. The GPLOT procedure uses this output data set and displays the nonlinear transformations of all three variables and the nearly one-dimensional scatter plot (see Figure 53.3 and Figure 53.4).

PROC PRINQUAL tries to project each variable on the first principal component. Notice that the curve in this example is closer to a circle than to a function from some views (see the plot of X3 vs. X2 in Figure 53.1) and that the first component does not run approximately from one end point of the curve to the other (see Figure 53.4). Since the curve has these characteristics, PROC PRINQUAL linearizes the scatter plot by collapsing the scatter around the principal axis, not by straightening the curve into a single line. PROC PRINQUAL would straighten simpler curves.

The following statements produce Figure 53.1 through Figure 53.4:

   * Generate a Three-Dimensional Curve;
   data X;
      do X = -1 to 1 by 0.02;
         X1 =      X ** 3;
         X2 = X1 - X ** 5;
         X3 = X2 - X ** 6;
         output;
      end;
      drop X;
   run;

   goptions goutmode=replace nodisplay;
   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;
   * Depending on your goptions, these plot options may work better:
   * %let opts = haxis=axis2 vaxis=axis1 frame;

   proc gplot data=X;
      title;
      axis1 minor=none label=(angle=90 rotate=0)
            order=(-1 to 1);
      axis2 minor=none order=(-1 to 1);
      plot X1*X2 / &opts name='prqin1';
      plot X3*X2 / &opts name='prqin2' vreverse;
      plot X1*X3 / &opts name='prqin3';
      symbol1 color=blue;
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:prqin1 2:prqin2 3:prqin3;
   run; quit;

   * Try to Straighten the Curve;
   proc prinqual data=X n=1 maxiter=50 covariance;
      title 'Iteratively Derive Variable Transformations';
      transform spline(X1-X3 / nknots=9);
   run;

   * Plot the Transformations;
   goptions nodisplay;
   proc gplot;
      title;
      axis1 minor=none label=(angle=90 rotate=0);
      axis2 minor=none;
      plot TX1*X1 / &opts name='prqin4';
      plot TX2*X2 / &opts name='prqin5';
      plot TX3*X3 / &opts name='prqin6';
      symbol1 color=blue;
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:prqin4 2:prqin6 3:prqin5;
   run; quit;

   * Plot the Straightened Scatter Plot;
   goptions nodisplay;
   proc gplot;
      axis1 minor=none label=(angle=90 rotate=0)
            order=(-1 to 1);
      axis2 minor=none order=(-1 to 1);
      plot TX1*TX2 / &opts name='prqin7';
      plot TX3*TX2 / &opts name='prqin8' vreverse;
      plot TX1*TX3 / &opts name='prqin9';
      symbol1 color=blue;
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:prqin7 2:prqin8 3:prqin9;
   run; quit;

Figure 53.1: Three-Dimensional Curve Example Output

Iteratively Derive Variable Transformations

The PRINQUAL Procedure

PRINQUAL MTV Algorithm Iteration History
Iteration Number	Average Change	Maximum Change	Proportion of Variance	Criterion Change	Note
1	0.16253	1.33045	0.71369
2	0.07871	0.94549	0.79035	0.07667
3	0.06518	0.80219	0.86334	0.07299
4	0.05322	0.57928	0.91379	0.05045
5	0.04154	0.38404	0.94204	0.02825
6	0.03181	0.24391	0.95640	0.01436
7	0.02461	0.15397	0.96349	0.00709
8	0.01982	0.10205	0.96704	0.00355
9	0.01662	0.07393	0.96894	0.00189
10	0.01439	0.06232	0.97005	0.00112
11	0.01288	0.05436	0.97081	0.00075
12	0.01189	0.04911	0.97139	0.00058
13	0.01119	0.04531	0.97188	0.00049
14	0.01068	0.04276	0.97232	0.00044
15	0.01027	0.04115	0.97273	0.00041
16	0.00993	0.04039	0.97313	0.00040
17	0.00965	0.04249	0.97351	0.00038
18	0.00940	0.04400	0.97388	0.00037
19	0.00919	0.04509	0.97423	0.00036
20	0.00900	0.04587	0.97458	0.00034
21	0.00883	0.04643	0.97491	0.00033
22	0.00867	0.04681	0.97523	0.00032
23	0.00852	0.04705	0.97555	0.00031
24	0.00839	0.04719	0.97585	0.00031
25	0.00827	0.04724	0.97615	0.00030
26	0.00816	0.04722	0.97644	0.00029
27	0.00805	0.04713	0.97672	0.00028
28	0.00795	0.04699	0.97700	0.00027
29	0.00785	0.04680	0.97726	0.00027
30	0.00776	0.04656	0.97752	0.00026
31	0.00768	0.04629	0.97777	0.00025
32	0.00760	0.04598	0.97802	0.00025
33	0.00752	0.04564	0.97826	0.00024
34	0.00745	0.04528	0.97849	0.00023
35	0.00739	0.04489	0.97872	0.00023
36	0.00733	0.04448	0.97894	0.00022
37	0.00729	0.04405	0.97915	0.00022
38	0.00724	0.04361	0.97936	0.00021
39	0.00720	0.04315	0.97957	0.00021
40	0.00716	0.04268	0.97977	0.00020
41	0.00713	0.04219	0.97997	0.00020
42	0.00709	0.04170	0.98016	0.00019
43	0.00706	0.04120	0.98035	0.00019
44	0.00703	0.04070	0.98054	0.00019
45	0.00699	0.04019	0.98072	0.00018
46	0.00696	0.03967	0.98090	0.00018
47	0.00693	0.03916	0.98107	0.00017
48	0.00690	0.03864	0.98124	0.00017
49	0.00687	0.03812	0.98141	0.00017
50	0.00684	0.03760	0.98158	0.00017	Not Converged

ERROR: Failed to converge.

Figure 53.2: PROC PRINQUAL MTV Iteration History

Figure 53.3: Variable Transformation Plots

Figure 53.4: Plots of the Nearly One-Dimensional Curve

Chapter Contents
Previous
Next
Top