Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
HISTOGRAM Statement

Example 4.7: Annotating a Folded Normal Curve

See FNORM2 in the SAS/QC Sample Library

This example shows how to display a fitted curve that is not supported by the HISTOGRAM statement.

The offset of an attachment point is measured (in mm) for a number of manufactured assemblies, and the measurements are saved in a data set named ASSEMBLY.

   data assembly;
      label offset = 'Offset (in mm)';
      input offset @@;
      datalines;
   11.11 13.07 11.42  3.92 11.08  5.40 11.22 14.69  6.27  9.76
    9.18  5.07  3.51 16.65 14.10  9.69 16.61  5.67  2.89  8.13
    9.97  3.28 13.03 13.78  3.13  9.53  4.58  7.94 13.51 11.43
   11.98  3.90  7.67  4.32 12.69  6.17 11.48  2.82 20.42  1.01
    3.18  6.02  6.63  1.72  2.42 11.32 16.49  1.22  9.13  3.34
    1.29  1.70  0.65  2.62  2.04 11.08 18.85 11.94  8.34  2.07
    0.31  8.91 13.62 14.94  4.83 16.84  7.09  3.37  0.49 15.19
    5.16  4.14  1.92 12.70  1.97  2.10  9.38  3.18  4.18  7.22
   15.84 10.85  2.35  1.93  9.19  1.39 11.40 12.20 16.07  9.23
    0.05  2.15  1.95  4.39  0.48 10.16  4.81  8.28  5.68 22.81
    0.23  0.38 12.71  0.06 10.11 18.38  5.53  9.36  9.32  3.63
   12.93 10.39  2.05 15.49  8.12  9.52  7.77 10.70  6.37  1.91
    8.60 22.22  1.74  5.84 12.90 13.06  5.08  2.09  6.41  1.40
   15.60  2.36  3.97  6.17  0.62  8.56  9.36 10.19  7.16  2.37
   12.91  0.95  0.89  3.82  7.86  5.33 12.92  2.64  7.92 14.06
   ;

The assembly process is in statistical control, and it is decided to fit a folded normal distribution to the offset measurements. A variable X has a folded normal distribution if X=|Y|, where Y is distributed as N(\mu,\sigma). The fitted density is

h(x) = \frac{1}{\sqrt{2\pi} \sigma}
 [ \exp ( -\frac{(x-\mu)^2}{2\sigma^2} ) +
 \exp ( -\frac{(x+\mu)^2}{2\sigma^2} )
 ] \;\;\;\;\; , \; x \geq 0

You can use SAS/IML software to compute preliminary estimates of \mu and \sigma based on a method of moments given by Elandt (1961). These estimates are computed by solving equation (19) of Elandt (1961), which is given by

f(\theta) = \frac{( \frac{2}{\sqrt{2\pi}} e^{-\theta^2/2 } -
 \theta[ 1-2\Phi(\theta) ] )^2}
 {1+\theta^2}
 = A
where \Phi(\cdot) is the standard normal distribution function, and
A = \frac { \bar{x}^2 }
 { \frac{1}n \sum_{i=1}^n x_i^2 }
Then the estimates of \sigma and \mu are given by

\hat{\sigma}_0 & = \sqrt{ \frac{ \frac{1}n \sum_{i=1}^n x_i^2}
 {1 + \hat{\theta}^2 } } \ 
\hat{\mu}_0 & = \hat{\theta} \cdot \hat{\sigma}_0

Begin by using the MEANS procedure to compute the first and second moments and using the DATA step to compute the constant A.

   proc means data=assembly noprint;
      var offset;
      output out=stat mean=m1 var=var n=n min=min;

   * Compute constant A from equation (19) of Elandt (1961) ;
   data stat;
      keep m2 a min;
      set stat;
      a  = (m1*m1);
      m2 = ((n-1)/n)*var + a;
      a  = a/m2;

Next, use the SAS/IML subroutine NLPDD to solve equation (19) by minimizing (f(\theta)-A)^2, and compute \hat{\mu}_0 and \hat{\sigma}_0.

   proc iml;
      use stat;
      read all var {m2}  into m2;
      read all var {a}   into a;
      read all var {min} into min;

      * f(t) is the function in equation (19) of Elandt (1961) ;
      start f(t) global(a);
         y = 0.39894*exp(-0.5*t*t);
         y = (2*y-(t*(1-2*probnorm(t))))**2/(1+t*t);
         y = (y-a)**2;
         return(y);
      finish;

      * Minimize (f(t)-A)**2 and estimate mu and sigma ;
      if ( min < 0 ) then do;
         print "Warning: Observations are not all nonnegative.";
         print "The folded normal is inappropriate.";
         stop;
         end;
      if ( a < 0.6374 ) then do;
         print "Warning: Estimates may be unreliable";
         end;
      opt = { 0 0 };
      con = { 1e-6 };
      x0  = { 2.0 };
      tc  = { . . . . . 1e-12 . . . . . . .};
      call nlpdd(rc,etheta0,"f",x0,opt,con,tc);
      esig0 = sqrt(m2/(1+etheta0*etheta0));
      emu0  = etheta0*esig0;

      create prelim var {emu0 esig0 etheta0};
      append;
      close prelim;
The preliminary estimates are saved in the data set PRELIM, as shown in Output 4.7.1.

Output 4.7.1: Preliminary Estimates of \mu, \sigma, and \theta
 
The Data Set PRELIM

EMU0 ESIG0 ETHETA0
6.51735 6.54953 0.99509

Now, using \hat{\mu}_0 and \hat{\sigma}_0 as initial estimates, call the NLPDD subroutine to maximize the log likelihood, l(\mu,\sigma), of the folded normal distribution, where, up to a constant,
l(\mu,\sigma) = -n \log \sigma +
 \sum_{i=1}^n \log
 [
 \exp ( -\frac{(x_i-\mu)^2}{2\sigma^2} ) +
 \exp ( -\frac{(x_i+\mu)^2}{2\sigma^2} )
 ]

      * Define the log likelihood of the folded normal ;
      start g(p) global(x);
         y = 0.0;
         do i = 1 to nrow(x);
            z = exp( (-0.5/p[2])*(x[i]-p[1])*(x[i]-p[1]) );
            z = z + exp( (-0.5/p[2])*(x[i]+p[1])*(x[i]+p[1]) );
            y = y + log(z);
            end;
         y = y - nrow(x)*log( sqrt( p[2] ) );
         return(y);
      finish;

      * Maximize the log likelihood with subroutine NLPDD ;
      use assembly;
      read all var {offset} into x;
      esig0sq = esig0*esig0;
      x0      = emu0||esig0sq;
      opt     = { 1 0 };
      con     = { . 0.0, .  .  };
      call nlpdd(rc,xr,"g",x0,opt,con);
      emu     = xr[1];
      esig    = sqrt(xr[2]);
      etheta  = emu/esig;

      create parmest var{emu esig etheta};
      append;
      close parmest;
   quit;

The data set PARMEST saves the maximum likelihood estimates \hat{\mu} and \hat{\sigma} (as well as \hat{\mu}/\hat{\sigma}), as shown in Output 4.7.2.

Output 4.7.2: Final Estimates of \mu, \sigma, and \theta
 
The Data Set PARMEST

EMU ESIG ETHETA
6.66761 6.39650 1.04239

To annotate the curve on a histogram, begin by computing the width and endpoints of the histogram intervals. The following statements save these values in an OUTFIT= data set called OUT. Note that a plot is not produced at this point.

   proc capability data=assembly noprint;
      histogram offset / outfit=out normal(noprint) noplot;
   run;

Output 4.7.3 provides a listing of the data set OUT. The width and endpoints of the histogram bars are saved as values of the variables _WIDTH_, _MIDPT1_, and _MIDPTN_. See "Output Data Sets".

Output 4.7.3: The OUTFIT= Data Set OUT
 
OUTFIT= Data Set OUT

_VAR_ _CURVE_ _LOCATN_ _SCALE_ _CHISQ_ _DF_ _PCHISQ_ _MIDPT1_ _WIDTH_ _MIDPTN_ _EXPECT_ _ESTSTD_ _ADASQ_ _ADP_ _CVMWSQ_ _CVMP_ _KSD_ _KSP_
offset NORMAL 7.62 5.24 31.17 5 0 1.5 3 22.5 7.62 5.24 1.90 0 0.28 0 0.09 0.01

The following statements create an annotate data set named ANNO, which contains the coordinates of the fitted curve:

   data anno;
      merge parmest out;
      length function color $ 8;

      function = 'point';
      color    = 'black';
      size     =  2;
      xsys     = '2';
      ysys     = '2';
      when     = 'a';
      constant = 39.894*_width_;
      left     =  _midpt1_ - 0.5*_width_;
      right    =  _midptn_ + 0.5*_width_;
      inc      = (right-left)/100;
      do x = left to right by inc;
         z1 = (x-emu)/esig;
         z2 = (x+emu)/esig;
         y  = (constant/esig)*(exp(-0.5*z1*z1)+exp(-0.5*z2*z2));
         output;
         function = 'draw';
         end;
   run;

The following statements read the ANNOTATE= data set and display the histogram and fitted curve, as shown in Output 4.7.4:

   title "Folded Normal Distribution";
   legend1 frame cframe=ligr cborder=black position=center;
   proc capability data=assembly noprint;
      spec usl=27  cusl=black  lusl=2  wusl=2;
      histogram offset / annotate  = anno
                         cfill     = red
                         cframe    = ligr
                         legend    = legend1;
   run;

Output 4.7.4: Histogram with Annotated Folded Normal Curve
caphex7d.gif (6079 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.