Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The VARIOGRAM Procedure

Output Data Sets

The VARIOGRAM procedure produces three data sets: the OUTVAR=SAS-data-set, the OUTPAIR=SAS-data-set, and the OUTDIST=SAS-data-set. These data sets are described in the following sections.

OUTVAR=SAS-data-set

The OUTVAR= data set contains the standard and robust versions of the sample semivariogram, the covariance, and other information at each lag class.

The details of the computation of the variogram, the robust variogram, and the covariance is described in the section "Theoretical and Computational Details of the Semivariogram".

The OUTVAR= data set contains the following variables:

The bandwidth variable, BANDW, is not included in the data set if no bandwidth specification is given in the COMPUTE statement or in a DIRECTIONS statement.

OUTDIST=SAS-data-set

The OUTDIST= data set contains counts for a modified histogram showing the distribution of pairwise distances. The purpose of this data set is to enable you to make choices for the value of the LAGDISTANCE= option in the COMPUTE statement in subsequent runs of PROC VARIOGRAM.

For plotting and estimation purposes, it is desirable to have as many points as possible for a variogram plot. However, a rule of thumb used in computing sample semivariograms is to use at least 30 points in each interval whenever possible. Hence, there is a lower limit to the value of the LAGDISTANCE= option.

Since the distribution of pairwise distances is seldom known in advance, the information contained in the OUTDIST= data set enables you to choose, in an iterative fashion, a value for the LAGDISTANCE= parameter. The value you choose is a compromise between the number of pairs making up each variogram point and the number of variogram points.

In some cases, the pattern of measured points may result in some lag or distance classes having a small number of pairs, while the remaining classes have a large number of pairs. By adjusting the value of the LAGDISTANCE= option to honor the rule of thumb (at least 30 pairs), you are "wasting" pairs in the other distance classes.

One strategy for solving this problem is to use less than 30 pairs for these distance classes. Then, either delete the corresponding variogram points or use them and accept the increased uncertainty. Unfortunately, the deficient distance classes are usually those close to the origin (h=0). This is the crucial portion of the experimental variogram curve for determining the form of the theoretical variogram and for detecting the presence of a nugget effect.

Another alternative is to force distance classes to contain approximately the same number of pairs. This results in distance classes of unequal widths.

While PROC VARIOGRAM does not produce such distance classes directly, the OUTPAIR= data set, described in the section "OUTPAIR=SAS-data-set", contains information on all distinct pairs of points. You can use this data set, along with the RANK procedure, to produce experimental variogram-based equal numbers of pairs in each distance class.

To request an OUTDIST= data set, you specify the OUTDIST= data set in the PROC VARIOGRAM statement and the NOVARIOGRAM option in the COMPUTE statement. The NOVARIOGRAM option prevents any variogram or covariance computation from being performed.

Computation of the Distribution Distance Classes

The simplest way of determining the distribution of pairwise distances is to determine the maximum distance hmax between pairs and divide this distance by some number N of intervals to produce distance classes of length \delta = \frac{h_{max}}N. The distance between each pair of points P1, P2, denoted | P1P2 |, is computed, and the pair P1P2 is counted in the kth distance class if | P_1P_2 | \in [(k-1)\delta,k\delta)for k = 1, ... ,N.

The actual computation is a slight variation of this. A bound, rather than the actual maximum distance, is computed. This bound is the length of the diagonal of a bounding rectangle for the data points. This bounding rectangle is found by using the maximum and minimum x and y coordinates, xmax, xmin, ymax, ymin, and forming the rectangle determined by the points

(xmax, ymax), (xmax, ymin), (xmin, ymin), (xmin, ymax)

See Figure 70.16 for an illustration of the bounding rectangle.

vard1g.gif (1695 bytes)

Figure 70.16: Bounding Rectangle to Determine Maximum Pairwise Distance

The pairwise distance bound, denoted by hb, is given by

hb2 = (xmax-xmin)2 + (ymax-ymin)2
Using hb, the interval (0,hb] is divided into N+1 subintervals, where N is the value of the NHCLASSES= option specified in the COMPUTE statement, or N=10 if the NHCLASSES= option is not specified. The basic distance unit is h0 = [(hb)/N]; the distance intervals are centered on h0,2h0, ... ,Nh0, with a distance tolerance of +-\frac{h_0}2.The extra subinterval is (0,h0/2), corresponding to the 0th lag. It is half the length of the remaining subintervals, and it often contains the smallest number of pairs. This method of partitioning the interval (0,hb] is identical to what is done when you actually compute the sample variogram.

The lag classes corresponding to h0=1 are shown in Figure 70.17.

          

vard1e.gif (823 bytes)

Figure 70.17: Lag Classes Corresponding to h0=1

By increasing or decreasing the value of the NHCLASSES= option, you can adjust the lag or distance class with the smallest count so that this count is around 30 or some other value that you judge appropriate.

Once you determine an appropriate value for the NHCLASSES= option, you can use the width of the lag classes as a candidate value for the LAGDIST= option in the COMPUTE statement. The width of the lag classes is determined by the upper bound (UB) and lower bound (LB) variables.

For example, read the observation from the OUTDIST= data set corresponding to lag 1 and compute the quantity UB-LB. Use this value for the LAGDIST= option in the COMPUTE statement.

Note: Do not use the 0th lag class; it is half the length of the other intervals. Use lag 1 instead.

Variables in the OUTDIST= data set

The following variables are written to the OUTDIST= data set:

OUTPAIR=SAS-data-set

The OUTPAIR= data set contains one observation for each distinct pair of points P1, P2 in the original data set, unless you specify the OUTPDISTANCE= option in the COMPUTE statement.

If you specify OUTPDISTANCE=Dmax in the COMPUTE statement, all pairs P1, P2 in the original data set that satisfy the relation | P_1P_2 | \le D_{max}are written to the OUTPAIR= data set.

Note that the OUTPAIR= data set can be very large even for a moderately sized DATA= data set.

For example, if the DATA= data set has NOBS=500, the OUTPAIR= data set has NOBS( NOBS-1)/2 =124,750 if no OUTPDISTANCE= restriction is given in the COMPUTE statement.

The OUTPAIR= data set contains information on the distance and orientation for each point pair, and you can use it for specialized continuity measure calculations.

The OUTPAIR= data set contains the following variables:

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.