Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
HISTOGRAM Statement

Kernel Density Estimates

You can use the KERNEL option to superimpose kernel density estimates on histograms. Smoothing the data distribution with a kernel density estimate can be more effective than using a histogram to examine features that might be obscured by the choice of histogram bins or sampling variation. A kernel density estimate can also be more effective than a parametric curve fit when the process distribution is multimodal. See Example 4.5.

The general form of the kernel density estimator is

\hat{f}_{\lambda}(x) = \frac{1}{n\lambda}
 \sum^n_{i=1}K_{0}(\frac{x-x_{i}}{\lambda})

where K0(·) is a kernel function, \lambda is the bandwidth, n is the sample size, and xi is the i th observation.

The KERNEL option provides three kernel functions (K0): normal, quadratic, and triangular. You can specify the function with the K= kernel-option in parentheses after the KERNEL option. Values for the K= option are NORMAL, QUADRATIC, and TRIANGULAR (with aliases of N, Q, and T, respectively). By default, a normal kernel is used. The formulas for the kernel functions are

{Normal} & K_0(t) = \frac{1}{\sqrt{2\pi}}
 \exp(-\frac{1}2t^2) &
 {for } -\infty...
 ...^2) &
 {for } | t| \leq 1 \ 
{Triangular} & K_0(t) = 1-| t| & {for } | t| \leq 1
The value of \lambda, referred to as the bandwidth parameter, determines the degree of smoothness in the estimated density function. You specify \lambdaindirectly by specifying a standardized bandwidth c with the C= kernel-option. If Q is the interquartile range, and n is the sample size, then c is related to \lambda by the formula
\lambda = cQn^{-\frac{1}5}
For a specific kernel function, the discrepancy between the density estimator \hat{f}_{\lambda}(x) and the true density f(x) is measured by the mean integrated square error (MISE):
{MISE}(\lambda) = \int_{x}\{ E(\hat{f}_{\lambda}(x)) - f(x)\}^2dx
 + \int_{x}var(\hat{f}_{\lambda}(x))dx

The MISE is the sum of the integrated squared bias and the variance. An approximate mean integrated square error (AMISE) is

{AMISE}(\lambda) = \frac{1}4\lambda^4
 (\int_{t}t^2K(t)dt)^2
 \int_x(f^''(x))^2dx
 + \frac{1}{n\lambda}\int_{t}K(t)^2dt
A bandwidth that minimizes AMISE can be derived by treating f(x) as the normal density having parameters \mu and \sigma estimated by the sample mean and standard deviation. If you do not specify a bandwidth parameter or if you specify C=MISE, the bandwidth that minimizes AMISE is used. The value of AMISE can be used to compare different density estimates. For each estimate, the bandwidth parameter c, the kernel function type, and the value of AMISE are reported in the SAS log.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.