Chapter Contents

Previous

Next
The COMPARE Procedure

Concepts

PROC COMPARE first compares the following:

After making these comparisons, PROC COMPARE compares the values in the parts of the data sets that match. PROC COMPARE either compares the data by the position of observations or by the values of an ID variable.


A Comparison by Position of Observations
Comparison by the Positions of Observations shows two data sets. The data inside the shaded boxes show the part of the data sets that the procedure compares. Assume that variables with the same names have the same type.

Comparison by the Positions of Observations

[IMAGE]

When you use PROC COMPARE to compare data set TWO with data set ONE, the procedure compares the first observation in data set ONE with the first observation in data set TWO, and it compares the second observation in the first data set with the second observation in the second data set, and so on. In each observation that it compares, the procedure compares the values of the IDNUM, NAME, GENDER, and GPA.

The procedure does not report on the values of the last two observations or the variable YEAR in data set TWO because there is nothing to compare them with in data set ONE.


A Comparison with an ID Variable
In a simple comparison, PROC COMPARE uses the observation number to determine which observations to compare. When you use an ID variable, PROC COMPARE uses the values of the ID variable to determine which observations to compare. ID variables should have unique values and must have the same type.

For the two data sets shown in Comparison by the Value of the ID Variable , assume that IDNUM is an ID variable and that IDNUM has the same type in both data sets. The procedure compares the observations that have the same value for IDNUM. The data inside the shaded boxes show the part of the data sets that the procedure compares.

Comparison by the Value of the ID Variable

[IMAGE]

The data sets contain three matching variables: NAME, GENDER, and GPA. They also contain five matching observations - the observations with values of 2998, 9866, 2118, 3847, and 2342 for IDNUM.

Data Set TWO contains two observations (IDNUM= 7565 and IDNUM= 1755) for which data set ONE contains no matching observations. Similarly, no variable in data set ONE matches the variable YEAR in data set TWO.

See Comparing Observations with an ID Variable for an example that uses an ID variable.


The Equality Criterion
The COMPARE procedure judges numeric values unequal if the magnitude of their difference, as measured according to the METHOD= option, is greater than the value of the CRITERION= option. PROC COMPARE provides four methods for applying CRITERION=:

For a numeric variable compared, let x be its value in the base data set and let y be its value in the comparison data set. If both x and y are nonmissing, the values are judged unequal according to the value of METHOD= and the value of CRITERION= ([gamma]) as follows:

If x or y is missing, then the comparison depends on the NOMISSING option. If NOMISSING is in effect, a missing value will always compare equal to anything. Otherwise, a missing value is judged equal only to a missing value of the same type, (that is, .=., .^=.A, .A=.A, .A^=.B, and so on).

If the value specified for CRITERION= is negative, the actual criterion used is made equal to the absolute value of [gamma] times a very small number &egr; (epsilon) that depends on the numerical precision of the computer. This number &egr; is defined as the smallest positive floating-point value such that, using machine arithmetic, 1-&egr;<1<1+&egr;. Round-off or truncation error in floating-point computations is typically a few orders of magnitude larger than &egr;. This means that CRITERION=-1000 often provides a reasonable test of the equality of computed results at the machine level of precision.

The value [delta] added to the denominator in the RELATIVE method is specified in parentheses after the method name: METHOD=RELATIVE([delta]). If not specified in METHOD=, [delta] defaults to 0. The value of [delta] can be used to control the behavior of the error measure when both x and y are very close to 0. If [delta] is not given and x and y are very close to 0, any error produces a large relative error (in the limit, 2).

Specifying a value for [delta] avoids this extreme sensitivity of the RELATIVE method for small values. If you specify METHOD=RELATIVE([delta]) CRITERION=[gamma] when both x and y are much smaller than [delta] in absolute value, the comparison is as if you had specified METHOD=ABSOLUTE CRITERION=[delta][gamma]. However, when either x or y is much larger than [delta] in absolute value, the comparison is like METHOD=RELATIVE CRITERION=[gamma]. For moderate values of x and y, METHOD=RELATIVE([delta]) CRITERION=[gamma] is, in effect, a compromise between METHOD=ABSOLUTE CRITERION=[delta] [gamma] and METHOD=RELATIVE CRITERION=[gamma].

For character variables, if one value has a greater length than the other, the shorter value is padded with blanks for the comparison. Nonblank character values are judged equal only if they agree at each character. If NOMISSING is in effect, blank character values compare equal to anything.

Definition of Difference and Percent Difference

In the reports of value comparisons and in the OUT= data set, PROC COMPARE displays difference and percent difference values for the numbers compared. These quantities are defined using the value from the base data set as the reference value. For a numeric variable compared, let x be its value in the base data set and let y be its value in the comparison data set. If x and y are both nonmissing, the difference and percent difference are defined as follows:
Difference = [IMAGE]
Percent Difference = [IMAGE]
Percent Difference = missing for [IMAGE]


Formatted Values
PROC COMPARE compares unformatted values. If you have two matching variables that are formatted differently, PROC COMPARE lists the formats of the variables.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.