Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The TRANSREG Procedure

Example 65.4: Transformation Regression of Exhaust Emissions Data

In this example, the MORALS algorithm is applied to data from an experiment in which nitrogen oxide emissions from a single cylinder engine are measured for various combinations of fuel, compression ratio, and equivalence ratio. The data are provided by Brinkman (1981).

The equivalence ratio and nitrogen oxide variables are continuous and numeric, so spline transformations of these variables are requested. Each spline is degree three with nine knots (one at each decile) in order to allow PROC TRANSREG a great deal of freedom in finding transformations. The compression ratio variable has only five discrete values, so an optimal scoring is requested. The character variable Fuel is nominal, so it is designated as a classification variable. No monotonicity constraints are placed on any of the transformations. Observations with missing values are excluded with the NOMISS a-option.

The squared multiple correlation for the initial model is less than 0.25. PROC TRANSREG increases the R2 to over 0.95 by transforming the variables. The transformation plots show how each variable is transformed. The transformation of compression ratio (TCpRatio) is nearly linear. The transformation of equivalence ratio (TEqRatio) is nearly parabolic. It can be seen from this plot that the optimal transformation of equivalence ratio is nearly uncorrelated with the original scoring. This suggests that the large increase in R2 is due to this transformation. The transformation of nitrogen oxide (TNOx) is something like a log transformation.

These results suggest the parametric model

\log({NOX}) & = & b_0 + b_1 x {{\hv EqRatio}} +
 b_2 x {{\hv EqRatio}}^2 +
 b_3 x {{\hv CpRatio}} \ 
& &  + \sum_j b_j {class}_j ({{\hv Fuel}}) +
 {error}  . \

You can perform this analysis with PROC TRANSREG using the following MODEL statement:

    model log(NOx)= psp(EqRatio / deg=2) identity(CpRatio)
                    class(Fuel / zero=first);

The LOG transformation computes the natural log. The PSPLINE expansion expands EqRatio into a linear term, EqRatio, and a squared term, EqRatio2. A linear transformation of CpRatio and a dummy variable expansion of Fuel is requested with the first level as the reference level. These should provide a good parametric operationalization of the optimal transformations. The final model has an R2 of 0.91 (smaller than before since the model uses fewer degrees of freedom, but still quite good).

The following statements produce Output 65.4.1 through Output 65.4.3:

   title 'Gasoline Example';

   data Gas;
      input Fuel :$8. CpRatio EqRatio NOx @@;
      label Fuel    = 'Fuel'
            CpRatio = 'Compression Ratio (CR)'
            EqRatio = 'Equivalence Ratio (PHI)'
            NOx     = 'Nitrogen Oxide (NOx)';
      datalines;
   Ethanol  12.0 0.907 3.741 Ethanol  12.0 0.761 2.295
   Ethanol  12.0 1.108 1.498 Ethanol  12.0 1.016 2.881
   Ethanol  12.0 1.189 0.760 Ethanol   9.0 1.001 3.120
   Ethanol   9.0 1.231 0.638 Ethanol   9.0 1.123 1.170
   Ethanol  12.0 1.042 2.358 Ethanol  12.0 1.215 0.606
   Ethanol  12.0 0.930 3.669 Ethanol  12.0 1.152 1.000
   Ethanol  15.0 1.138 0.981 Ethanol  18.0 0.601 1.192
   Ethanol   7.5 0.696 0.926 Ethanol  12.0 0.686 1.590
   Ethanol  12.0 1.072 1.806 Ethanol  15.0 1.074 1.962
   Ethanol  15.0 0.934 4.028 Ethanol   9.0 0.808 3.148
   Ethanol   9.0 1.071 1.836 Ethanol   7.5 1.009 2.845
   Ethanol   7.5 1.142 1.013 Ethanol  18.0 1.229 0.414
   Ethanol  18.0 1.175 0.812 Ethanol  15.0 0.568 0.374
   Ethanol  15.0 0.977 3.623 Ethanol   7.5 0.767 1.869
   Ethanol   7.5 1.006 2.836 Ethanol   9.0 0.893 3.567
   Ethanol  15.0 1.152 0.866 Ethanol  15.0 0.693 1.369
   Ethanol  15.0 1.232 0.542 Ethanol  15.0 1.036 2.739
   Ethanol  15.0 1.125 1.200 Ethanol   9.0 1.081 1.719
   Ethanol   9.0 0.868 3.423 Ethanol   7.5 0.762 1.634
   Ethanol   7.5 1.144 1.021 Ethanol   7.5 1.045 2.157
   Ethanol  18.0 0.797 3.361 Ethanol  18.0 1.115 1.390
   Ethanol  18.0 1.070 1.947 Ethanol  18.0 1.219 0.962
   Ethanol   9.0 0.637 0.571 Ethanol   9.0 0.733 2.219
   Ethanol   9.0 0.715 1.419 Ethanol   9.0 0.872 3.519
   Ethanol   7.5 0.765 1.732 Ethanol   7.5 0.878 3.206
   Ethanol   7.5 0.811 2.471 Ethanol  15.0 0.676 1.777
   Ethanol  18.0 1.045 2.571 Ethanol  18.0 0.968 3.952
   Ethanol  15.0 0.846 3.931 Ethanol  15.0 0.684 1.587
   Ethanol   7.5 0.729 1.397 Ethanol   7.5 0.911 3.536
   Ethanol   7.5 0.808 2.202 Ethanol   7.5 1.168 0.756
   Indolene  7.5 0.831 4.818 Indolene  7.5 1.045 2.849
   Indolene  7.5 1.021 3.275 Indolene  7.5 0.970 4.691
   Indolene  7.5 0.825 4.255 Indolene  7.5 0.891 5.064
   Indolene  7.5 0.710 2.118 Indolene  7.5 0.801 4.602
   Indolene  7.5 1.074 2.286 Indolene  7.5 1.148 0.970
   Indolene  7.5 1.000 3.965 Indolene  7.5 0.928 5.344
   Indolene  7.5 0.767 3.834 Ethanol   7.5 0.749 1.620
   Ethanol   7.5 0.892 3.656 Ethanol   7.5 1.002 2.964
   82rongas  7.5 0.873 6.021 82rongas  7.5 0.987 4.467
   82rongas  7.5 1.030 3.046 82rongas  7.5 1.101 1.596
   82rongas  7.5 1.173 0.835 82rongas  7.5 0.931 5.498
   82rongas  7.5 0.822 5.470 82rongas  7.5 0.749 4.084
   82rongas  7.5 0.625 0.716 94%Eth    7.5 0.818 2.382
   94%Eth    7.5 1.128 1.004 94%Eth    7.5 1.191 0.623
   94%Eth    7.5 1.132 1.030 94%Eth    7.5 0.993 2.593
   94%Eth    7.5 0.866 2.699 94%Eth    7.5 0.910 3.177
   94%Eth   12.0 1.139 1.151 94%Eth   12.0 1.267 0.474
   94%Eth   12.0 1.017 2.814 94%Eth   12.0 0.954 3.308
   94%Eth   12.0 0.861 3.031 94%Eth   12.0 1.034 2.537
   94%Eth   12.0 0.781 2.403 94%Eth   12.0 1.058 2.412
   94%Eth   12.0 0.884 2.452 94%Eth   12.0 0.766 1.857
   94%Eth    7.5 1.193 0.657 94%Eth    7.5 0.885 2.969
   94%Eth    7.5 0.915 2.670 Ethanol  18.0 0.812 3.760
   Ethanol  18.0 1.230 0.672 Ethanol  18.0 0.804 3.677
   Ethanol  18.0 0.712  .    Ethanol  12.0 0.813 3.517
   Ethanol  12.0 1.002 3.290 Ethanol   9.0 0.696 1.139
   Ethanol   9.0 1.199 0.727 Ethanol   9.0 1.030 2.581
   Ethanol  15.0 0.602 0.923 Ethanol  15.0 0.694 1.527
   Ethanol  15.0 0.816 3.388 Ethanol  15.0 0.896  .
   Ethanol  15.0 1.037 2.085 Ethanol  15.0 1.181 0.966
   Ethanol   7.5 0.899 3.488 Ethanol   7.5 1.227 0.754
   Indolene  7.5 0.701 1.990 Indolene  7.5 0.807 5.199
   Indolene  7.5 0.902 5.283 Indolene  7.5 0.997 3.752
   Indolene  7.5 1.224 0.537 Indolene  7.5 1.089 1.640
   Ethanol   9.0 1.180 0.797 Ethanol   7.5 0.795 2.064
   Ethanol  18.0 0.990 3.732 Ethanol  18.0 1.201 0.586
   Methanol  7.5 0.975 2.941 Methanol  7.5 1.089 1.467
   Methanol  7.5 1.150 0.934 Methanol  7.5 1.212 0.722
   Methanol  7.5 0.859 2.397 Methanol  7.5 0.751 1.461
   Methanol  7.5 0.720 1.235 Methanol  7.5 1.090 1.347
   Methanol  7.5 0.616 0.344 Gasohol   7.5 0.712 2.209
   Gasohol   7.5 0.771 4.497 Gasohol   7.5 0.959 4.958
   Gasohol   7.5 1.042 2.723 Gasohol   7.5 1.125 1.244
   Gasohol   7.5 1.097 1.562 Gasohol   7.5 0.984 4.468
   Gasohol   7.5 0.928 5.307 Gasohol   7.5 0.889 5.425
   Gasohol   7.5 0.827 5.330 Gasohol   7.5 0.674 1.448
   Gasohol   7.5 1.031 3.164 Methanol  7.5 0.871 3.113
   Methanol  7.5 1.026 2.551 Methanol  7.5 0.598 0.204
   Indolene  7.5 0.973 5.055 Indolene  7.5 0.980 4.937
   Indolene  7.5 0.665 1.561 Ethanol   7.5 0.629 0.561
   Ethanol   9.0 0.608 0.563 Ethanol  12.0 0.584 0.678
   Ethanol  15.0 0.562 0.370 Ethanol  18.0 0.535 0.530
   94%Eth    7.5 0.674 0.900 Gasohol   7.5 0.645 1.207
   Ethanol  18.0 0.655 1.900 94%Eth    7.5 1.022 2.787
   94%Eth    7.5 0.790 2.645 94%Eth    7.5 0.720 1.475
   94%Eth    7.5 1.075 2.147
   ;

   *---Fit the Nonparametric Model---;
   proc transreg data=Gas dummy test nomiss;
      model spline(NOx / nknots=9)=spline(EqRatio / nknots=9)
                         opscore(CpRatio) class(Fuel / zero=first);
      title2 'Iteratively Estimate NOx, CPRATIO and EQRATIO';
      output out=Results;
   run;

   *---Plot the Results---;
   goptions goutmode=replace nodisplay;
   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;
   * Depending on your goptions, these plot options may work better:
   * %let opts = haxis=axis2 vaxis=axis1 frame;

   proc gplot data=Results;
      title;
      axis1 minor=none label=(angle=90 rotate=0);
      axis2 minor=none;
      symbol1 color=blue v=dot i=none;
      plot TCpRatio*CpRatio / &opts name='tregex1';
      plot TEqRatio*EqRatio / &opts name='tregex2';
      plot TNOx*NOx         / &opts name='tregex3';
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:tregex1 2:tregex3 3:tregex2;
   run; quit;

   *-Fit the Parametric Model Suggested by the Nonparametric Analysis-;
   proc transreg data=Gas dummy ss2 short nomiss;
      title 'Gasoline Example';
      title2 'Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 +';
      title3 'b3*CpRatio + Sum b(j)*Fuel(j) + Error';
      model log(NOx)= pspline(EqRatio / deg=2) identity(CpRatio)
                      class(Fuel / zero=first);
      output out=Results2;
   run;

Output 65.4.1: Transformation Regression Example: The Nonparametric Model

Gasoline Example
Iteratively Estimate NOx, CPRATIO and EQRATIO

The TRANSREG Procedure

TRANSREG MORALS Algorithm Iteration History for Spline(NOx)
Iteration
Number
Average
Change
Maximum
Change
R-Square Criterion
Change
Note
0 0.48074 3.86778 0.24597    
1 0.00000 0.00000 0.95865 0.71267 Converged

Algorithm converged.

The TRANSREG Procedure Hypothesis Tests for Spline(NOx)
Nitrogen Oxide (NOx)

Univariate ANOVA Table Based on the Usual Degrees of Freedom
Source DF Sum of Squares Mean Square F Value Liberal p
Model 21 326.0946 15.52831 162.27 >= <.0001
Error 147 14.0674 0.09570    
Corrected Total 168 340.1619      
The above statistics are not adjusted for the fact that the
dependent variable was transformed and so are generally liberal.

Root MSE 0.30935 R-Square 0.9586
Dependent Mean 2.34593 Adj R-Sq 0.9527
Coeff Var 13.18661    

Adjusted Multivariate ANOVA Table Based on the Usual Degrees of
Freedom
Dependent Variable Scoring Parameters=12 S=12 M=4 N=67
Statistic Value F Value Num DF Den DF p
Wilks' Lambda 0.041355 2.05 252 1455 <= <.0001
Pillai's Trace 0.958645 0.61 252 1764 <= 1.0000
Hotelling-Lawley Trace 23.18089 12.35 252 945.01 <= <.0001
Roy's Greatest Root 23.18089 162.27 21 147 >= <.0001

The Wilks' Lambda, Pillai's Trace, and Hotelling-Lawley Trace statistics are a conservative adjustment of the normal statistics. Roy's Greatest Root is liberal. These statistics are normally defined in terms of the squared canonical correlations which are the eigenvalues of the matrix H*inv(H+E). Here the R-Square is used for the first eigenvalue and all other eigenvalues are set to zero since only one linear combination is used. Degrees of freedom are computed assuming all linear combinations contribute to the Lambda and Trace statistics, so the F tests for those statistics are conservative. The p values for the liberal and conservative statistics provide approximate lower and upper bounds on p. A liberal test statistic with conservative degrees of freedom and a conservative test statistic with liberal degrees of freedom yield at best an approximate p value, which is indicated by a "~" before the p value.

Output 65.4.2: Transformation Regression Example: The Parametric Model

Gasoline Example
Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 +
b3*CpRatio + Sum b(j)*Fuel(j) + Error

The TRANSREG Procedure

Log(NOx)
Algorithm converged.

The TRANSREG Procedure Hypothesis Tests for Log(NOx)
Nitrogen Oxide (NOx)

Univariate ANOVA Table Based on the Usual Degrees of Freedom
Source DF Sum of Squares Mean Square F Value Pr > F
Model 8 79.33838 9.917298 213.09 <.0001
Error 160 7.44659 0.046541    
Corrected Total 168 86.78498      

Root MSE 0.21573 R-Square 0.9142
Dependent Mean 0.63130 Adj R-Sq 0.9099
Coeff Var 34.17294    

Univariate Regression Table Based on the Usual Degrees of Freedom
Variable DF Coefficient Type II
Sum of
Squares
Mean Square F Value Pr > F Label
Intercept 1 -14.586532 49.9469 49.9469 1073.18 <.0001 Intercept
Pspline.EqRatio_1 1 35.102914 62.7478 62.7478 1348.22 <.0001 Equivalence Ratio (PHI) 1
Pspline.EqRatio_2 1 -19.386468 64.6430 64.6430 1388.94 <.0001 Equivalence Ratio (PHI) 2
Identity(CpRatio) 1 0.032058 1.4445 1.4445 31.04 <.0001 Compression Ratio (CR)
Class.Fuel94_Eth 1 -0.449583 1.3158 1.3158 28.27 <.0001 Fuel 94%Eth
Class.FuelEthanol 1 -0.414242 1.2560 1.2560 26.99 <.0001 Fuel Ethanol
Class.FuelGasohol 1 -0.016719 0.0015 0.0015 0.03 0.8584 Fuel Gasohol
Class.FuelIndolene 1 0.001572 0.0000 0.0000 0.00 0.9853 Fuel Indolene
Class.FuelMethanol 1 -0.580133 1.7219 1.7219 37.00 <.0001 Fuel Methanol

Output 65.4.3: Plots of Compression Ratio, Equivalence Ratio, and Nitrogen Oxide
trege5b.gif (9871 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.