TPSPLINE Call
computes thin-plate smoothing splines
- CALL TPSPLINE( fitted, coeff, adiag, gcv, x, y<,
lambda> );
The TSPLINE subroutine computes thin-plate
smoothing spline (TPSS) fits to approximate smooth
multivariate functions that are observed with noise.
The generalized cross validation (GCV) function
is used to select the smoothing parameter.
The TPSPLINE subroutine returns the following values:
- fitted
- is an n×1 vector of fitted values of the
TPSS fit evaluated at the design points x.
The n is the number of observations.
The final TPSS fit depends on the optional lambda.
- coeff
- is a vector of spline coefficients.
The vector contains the coefficients for basis
functions in the null space and the representer
of evaluation functions at unique design points.
(Refer to Wahba 1990 for more detail on reproducing kernel
Hilbert space and representer of evaluation functions.)
The length of coeff vector depends on the number of unique
design points and the number of variables in the spline model.
In general, let nuobs and k be the number of unique
rows and the number of columns of x respectively.
The length of coeff equals to k+ nuobs+1.
The coeff vector can be used as an input of TPSPLNEV to evaluate the resulting TPSS fit at new data points.
- adiag
- is an n ×1 vector of diagonal
elements of the "hat" matrix.
See the "Details" section.
- gcv
- If lambda is not specified, then
gcv is the minimum value of the GCV function.
If lambda is specified, then gcv is a
vector (or scalar if lambda is a scalar) of
GCV values evaluated at the lambda points.
It provides users both with the ability to study the GCV curves
by plotting gcv against lambda,
and with the chance to identify a possible local minimum.
The inputs to the TPSPLINE subroutine are as follows:
- x
- is an n ×k matrix of design
points on which the TPSS is to be fit.
The k is the number of variables in the spline model.
The columns of x need to be linearly independent
and contain no constant column.
- y
- is the n ×1 vector of observations.
- lambda
- is a optional q ×1 vector containing
values in scale.
This option gives users the power to control how
they want the TPSPLINE subroutine to function.
If lambda is not specified (or lambda is
specified and q > 1) the GCV function is used to choose the
"best" and the returning fitted values
are based on the that minimizes the GCV function.
If lambda is specified and q=1, no minimization
of the GCV function is involved and the fitted,
coeff and adiag values are all based
on the TPSS fit using this particular lambda.
This gives users the freedom to choose the
which they think appropriate.
Aside from the values returned, the TPSPLINE subroutine
also prints other useful information such as the number
of unique observations, the dimensions of the null space,
the number of parameters in the model, a GCV estimate of
, the smoothing penalty, the residual sum of
square, the trace of , an estimate of
, and the sum of squares for replication.
Note: No missing values are
allowed within the input arguments.
Also, you should use caution if you want
to specify small lambda values.
Since the true , a
very small value for lambda can cause to be smaller
than the magnitude of machine error and usually the returned
gcv values from such a cannot be trusted.
Finally, when using TPSPLINE be aware that TPSS is a
computationally intensive method. Therefore a large data
set (that is, a large number of unique design points)
will take a lot of computer memory and time.
For convenience, we illustrate the TPSS method with a
two-dimensional independent variable X = (x1,x2).
More details can be found in Wahba (1990),
or in Bates, et al. (1987).
Assume that the data is from the model
where (xi, yi), i = 1, ... ,n are the observations.
The function f is unknown and you
assume that it is reasonably smooth.
The error terms are independent zero-mean random variables.
You will measure the smoothness of f by the
integral over the entire plane of the square of the
partial derivatives of f of total order 2, that is
Using this as a smoothness penalty, the thin-plate smoothing
spline estimate of f is the minimizer of
Duchon (1976) derived that the
minimizer can be represented as
where
and .
Let matrix K have entries (K)ij = E2(xi-xj)
and matrix T have entries .
Then the minimization problem can be rewritten as
finding coefficients and to minimize
The final TPSS fits can be viewed as a type
of generalized ridge regression estimator.
The is called the smoothing parameter,
which controls the balance between the goodness
of fit and the smoothness of the final estimate.
The smoothing parameter can be chosen by minimizing
the Generalized Cross Validation function (GCV).
If you write
and call the as the "hat" matrix,
the GCV function is defined as
The returned values from this function call will provide
the as fitted, the
as coeff, and as adiag.
To evaluate the TPSS fit at
new data points, you can use the TPSPLNEV call.
Suppose X new, a m ×k matrix, contains the m
new data points at which you want to evaluate .
Let
and (K newij) = E2(x newi-xj)
be the ijth elements of
T new and K new respectively.
The prediction at new data points X new is
Therefore, using the coefficient obtained from
TPSPLINE call, the y pred can be easily evaluated.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.