Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The TPSPLINE Procedure

Example 64.3: Multiple Minima of the GCV Function

The following data represent the deposition of sulfate (SO4) at 179 sites in 48 contiguous states of the United States in 1990. Each observation records the latitude and longitude of the site as well as the SO4 deposition at the site measured in gram per square meter (g/m2).

You can use PROC TPSPLINE to fit a surface that reflects the general trend and that reveals underlying features of the data.

   data so4;
      input latitude longitude so4 @@;
      datalines;
      32.45833  87.24222 1.403 34.28778  85.96889 2.103
      33.07139 109.86472 0.299 36.07167 112.15500 0.304
      31.95056 112.80000 0.263 33.60500  92.09722 1.950
      34.17944  93.09861 2.168 36.08389  92.58694 1.578
                .
                .
                .
       162 additional observations
                .
                .
                .
      45.82278  91.87444 0.984 41.34028 106.19083 0.335
      42.73389 108.85000 0.236 42.49472 108.82917 0.313
      42.92889 109.78667 0.182 43.22278 109.99111 0.161
      43.87333 104.19222 0.306 44.91722 110.42028 0.210
      45.07611  72.67556 2.646
      ;

   data pred; 
      do latitude = 25 to 47 by 1;
         do longitude = 68 to 124 by 1;
            output;
         end;
      end;
   run;

The preceding statements create the SAS data set so4 and the data set pred in order to make predictions on a regular grid. The following statements fit a surface for SO4 deposition. The ODS OUTPUT statement creates a data set called GCV to contain the GCV values for LOGNLAMBDA in the range from -6 to 1.

   proc tpspline data=so4;
      ods output GCVFunction=gcv;
      model so4 = (latitude longitude) /lognlambda=(-6 to 1 by 0.1);
      score data=pred out=prediction1;
   run;

Partial output from these statements is displayed in Output 64.3.1.

Output 64.3.1: Partial Output from PROC TPSPLINE for Data Set SO4

The TPSPLINE Procedure
Dependent Variable: so4

Summary of Input Data Set
Number of Non-Missing Observations 179
Number of Missing Observations 0
Unique Smoothing Design Points 179

Summary of Final Model
Number of Regression Variables 0
Number of Smoothing Variables 2
Order of Derivative in the Penalty 2
Dimension of Polynomial Space 3

Summary Statistics of Final Estimation
log10(n*Lambda) 0.277005
Smoothing Penalty 2.458790
Residual SS 12.444975
Tr(I-A) 140.274968
Model DF 38.725032
Standard Deviation 0.297856


The following statements produce Output 64.3.2:

   symbol1 interpol=join value=none;
   title "GCV Function";

   proc gplot data=gcv;
      plot gcv*lognlambda/frame cframe=ligr
                          vaxis=axis1 haxis=axis2;
   run;

Output 64.3.2 displays the plot of the GCV function versus nlambda in log10 scale. The GCV function has two minima. PROC TPSPLINE locates the minimum at 0.277005. The figure also displays a local minimum located around -2.56. Note that the TPSPLINE procedure may not always find the global minimum, although it did in this case.

Output 64.3.2: GCV Function of SO4 Data Set
tpse3b.gif (3823 bytes)

The following analysis specifies the option LOGNLAMBDA0=-2.56. The output is displayed in Output 64.3.3.

   proc tpspline data=so4;
      model so4 = (latitude longitude) /lognlambda0=-2.56;
      score data=pred out=prediction2;
   run;

Output 64.3.3: Output from PROC TPSPLINE for Data Set SO4 with LOGNLAMBDA=-2.56

The TPSPLINE Procedure
Dependent Variable: so4

Summary of Input Data Set
Number of Non-Missing Observations 179
Number of Missing Observations 0
Unique Smoothing Design Points 179

Summary of Final Model
Number of Regression Variables 0
Number of Smoothing Variables 2
Order of Derivative in the Penalty 2
Dimension of Polynomial Space 3

Summary Statistics of Final Estimation
log10(n*Lambda) -2.560000
Smoothing Penalty 177.214368
Residual SS 0.043790
Tr(I-A) 7.208638
Model DF 171.791362
Standard Deviation 0.077940


The smoothing penalty is much smaller in Output 64.3.3 than that displayed in Output 64.3.1. The estimate in Output 64.3.1 uses a large lambda value and, therefore, the surface is smoother than the estimate using LOGNLAMBDA=-2.56 (Output 64.3.3).

The estimate based on LOGNLAMBDA=-2.56 has a larger value for the degrees of freedom, and it has a much smaller standard deviation.

However, a smaller standard deviation in nonparametric regression does not necessarily mean that the estimate is good: a small \lambdavalue always produces an estimate closer to the data and, therefore, a smaller standard deviation.

The following statements produce two contour plots of the estimates using the GCONTOUR procedure. In the final step, the plots are placed into a single graphic with the GREPLAY procedure.

   title "TPSPLINE fit with lognlambda=0.277";
   proc gcontour data=prediction1 gout=grafcat;
      plot latitude*longitude = P_so4/ 
                    name="tpscon1" legend=legend1
                    vaxis=axis1 haxis=axis2 
                    cframe=ligr hreverse;
   run;

   title "TPSPLINE fit with lognlambda=-2.56";
   proc gcontour data=prediction2 gout=grafcat;
      plot latitude*longitude = P_so4/ 
                    name="tpscon2" legend=legend1
                    vaxis=axis1 haxis=axis2 
                    cframe=ligr hreverse;
   run;

   title;
   proc greplay igout=grafcat tc=sashelp.templt template=v2 nofs;
      treplay 1:tpscon1
              2:tpscon2;
   quit;
   run;

Compare the two estimates by examining the contour plots of both estimates (Output 64.3.4).

Output 64.3.4: Contour Plot of TPSPLINE Estimates with Different Lambdas
tpse3d.gif (12613 bytes)

As the contour plots show, the estimate with LOGNLAMBDA=0.277 may represent the underlying trend, while the estimate with the LOGNLAMBDA=-2.56 is very rough and may be modeling the noise component.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.