Chapter Contents

Previous

Next
The COMPARE Procedure

PROC COMPARE Statement


Restriction: If you omit COMPARE=, you must use the WITH and VAR statements.
Restriction: PROC COMPARE reports errors differently if one or both of the compared data sets are not RADIX addressable. Version 6 compressed files are not RADIX addressable, while, beginning with Version 7, compressed files are RADIX addressable. (The integrity of the data is not compromised; the procedure simply numbers the observations differently.)
Reminder: You can use data set options with the BASE= and COMPARE= options.

PROC COMPARE <option(s)>;

To do this Use this option
Specify the data sets to compare

Specify the base data set BASE=

Specify the comparison data set COMPARE=
Control the output data set

Create an output data set OUT=

Write an observation for each observation in the BASE= and COMPARE= data sets OUTALL

Write an observation for each observation in the BASE= data set OUTBASE

Write an observation for each observation in the COMPARE= data set OUTCOMP

Write an observation that contains the differences for each pair of matching observations OUTDIF

Suppress the writing of observations when all values are equal OUTNOEQUAL

Write an observation that contains the percent differences for each pair of matching observations OUTPERCENT
Create an output data set that contains summary statistics OUTSTATS=
Specify how the values are compared

Specify the criterion for judging the equality of numeric values CRITERION=

Specify the method for judging the equality of numeric values METHOD=

Judge missing values equal to any value NOMISSBASE and NOMISSCOMP
Control the details in the default report

Include the values for all matching observations ALLOBS

Print a table of summary statistics for all pairs of matching variables ALLSTATS and STATS

Include in the report the values and differences for all matching variables ALLVARS

Print only a short comparison summary BRIEFSUMMARY

Change the report for numbers between 0 and 1 FUZZ=

Restrict the number of differences to print MAXPRINT=

Suppress the print of creation and last-modified dates NODATE

Suppress all printed output NOPRINT

Suppress the summary reports NOSUMMARY

Suppress the value comparison results. NOVALUES

Produce a complete listing of values and differences PRINTALL

Print the value differences by observation, not by variable TRANSPOSE
Control the listing of variables and observations

List all variables and observations found in only one data set LISTALL

List all variables and observations found only in the base data set LISTBASE

List all observations found only in the base data set LISTBASEOBS

List all variables found only in the base data set LISTBASEVAR

List all variables and observations found only in the comparison data set LISTCOMP

List all observations found only in the comparison data set LISTCOMPOBS

List all variables found only in the comparison data set LISTCOMPVAR

List variables whose values are judged equal LISTEQUALVAR

List all observations found in only one data set LISTOBS

List all variables found in only one data set LISTVAR


Options

ALLOBS
includes in the report of value comparison results the values and, for numeric variables, the differences for all matching observations, even if they are judged equal.
Default: If you omit ALLOBS, PROC COMPARE prints values only for observations that are judged unequal.
Interaction: When used with the TRANSPOSE option, ALLOBS invokes the ALLVARS option and displays the values for all matching observations and variables.

ALLSTATS
prints a table of summary statistics for all pairs of matching variables.
See also: Table of Summary Statistics for information on the statistics produced

ALLVARS
includes in the report of value comparison results the values and, for numeric variables, the differences for all pairs of matching variables, even if they are judged equal.
Default: If you omit ALLVARS, PROC COMPARE prints values only for variables that are judged unequal.
Interaction: When used with the TRANSPOSE option, ALLVARS displays unequal values in context with the values for other matching variables. If you omit the TRANSPOSE option, ALLVARS invokes the
ALLOBS option and displays the values for all matching observations and variables.

BASE=SAS-data-set
specifies the data set to use as the base data set.
Alias: DATA=
Default: the most recently created SAS data set
Tip: You can use the WHERE= data set option with the BASE= option to limit the observations that are available for comparison.

BRIEFSUMMARY
produces a short comparison summary and suppresses the four default summary reports (data set summary report, variables summary report, observation summary report, and values comparison summary report).
Alias: BRIEF
Tip: By default, a listing of value differences accompanies the summary reports. To suppress this listing, use the NOVALUES option.
Featured in: Comparing Variables That Are in the Same Data Set

COMPARE=SAS-data-set
specifies the data set to use as the comparison data set.
Aliases: COMP=, C=
Default: If you omit COMPARE=, the comparison data set is the same as the base data set, and PROC COMPARE compares variables within the data set.
Restriction: If you omit COMPARE=, you must use the WITH statement.
Tip: You can use the WHERE= data set option with COMPARE= to limit the observations that are available for comparison.

CRITERION= [gamma]
specifies the criterion for judging the equality of numeric values. Normally, the value of [gamma] (gamma) is positive, in which case the number itself becomes the equality criterion. If you use a negative value for [gamma], PROC COMPARE uses an equality criterion proportional to the precision of the computer on which the SAS System is running.
Default: 0.00001
See also: The Equality Criterion for more information

ERROR
displays an error message in the SAS log when differences are found.
Interaction: This option overrides the WARNING option.

FUZZ=number
alters the values comparison results for numbers less than number. PROC COMPARE prints

Default 0
Range: 0 - 1
Tip: A report that contains many trivial differences is easier to read in this form.

LISTALL
lists all variables and observations that are found in only one data set.
Alias LIST
Interaction: using LISTALL is equivalent to using the following four options: LISTBASEOBS, LISTCOMPOBS, LISTBASEVAR, and LISTCOMPVAR.

LISTBASE
lists all observations and variables that are found in the base data set but not in the comparison data set.
Interaction: Using LISTBASE is equivalent to using the LISTBASEOBS and LISTBASEVAR options.

LISTBASEOBS
lists all observations that are found in the base data set but not in the comparison data set.

LISTBASEVAR
lists all variables that are found in the base data set but not in the comparison data set.

LISTCOMP
lists all observations and variables that are found in the comparison data set but not in the base data set.
Interaction: Using LISTCOMP is equivalent to using the LISTCOMPOBS and LISTCOMPVAR options.

LISTCOMPOBS
lists all observations that are found in the comparison data set but not in the base data set.

LISTCOMPVAR
lists all variables that are found in the comparison data set but not in the base data set.

LISTEQUALVAR
prints a list of variables whose values are judged equal at all observations in addition to the default list of variables whose values are judged unequal.

LISTOBS
lists all observations that are found in only one data set.
Interaction: Using LISTOBS is equivalent to using the LISTBASEOBS and LISTCOMPOBS options.

LISTVAR
lists all variables that are found in only one data set.
Interaction: Using LISTVAR is equivalent to using both the LISTBASEVAR and LISTCOMPVAR options.

MAXPRINT=total | (per-variable, total)
specifies the maximum number of differences to print, where

total
is the maximum total number of differences to print. The default value is 500 unless you use the ALLOBS option (or both the ALLVAR and TRANSPOSE options), in which case the default is 32000.

per-variable
is the maximum number of differences to print for each variable within a BY group. The default value is 50 unless you use the ALLOBS option (or both the ALLVAR and TRANSPOSE options), in which case the default is 1000.

The MAXPRINT= option prevents the output from becoming extremely large when data sets differ greatly.

METHOD=ABSOLUTE | EXACT | PERCENT | RELATIVE<([delta])>
specifies the method for judging the equality of numeric values. The constant [delta] (delta) is a number between 0 and 1 that specifies a value to add to the denominator when calculating the equality measure. By default, [delta] is 0.

Unless you use the CRITERION= option, the default method is EXACT. If you use CRITERION=, the default method is RELATIVE([phi]), where [phi] (phi) is a small number that depends on the numerical precision of the computer on which you are running the SAS System and on the value of CRITERION=.
See also: The Equality Criterion

NODATE
suppresses the display in the data set summary report of the creation dates and the last modified dates of the base and comparison data sets.

NOMISSBASE
judges a missing value in the base data set equal to any value. (By default, a missing value is equal only to a missing value of the same kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.)

You can use this option to determine the changes that would be made to the observations in the comparison data set if it were used as the master data set and the base data set were used as the transaction data set in a DATA step UPDATE statement. For information on the UPDATE statement, see the chapter on SAS language statements in SAS Language Reference: Dictionary.

NOMISSCOMP
judges a missing value in the comparison data set equal to any value. (By default, a missing value is equal only to a missing value of the same kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.)

You can use this option to determine the changes that would be made to the observations in the base data set if it were used as the master data set and the comparison data set were used as the transaction data set in a DATA step UPDATE statement. For information on the UPDATE statement, see the chapter on SAS language statements in SAS Language Reference: Dictionary.

NOMISSING
judges missing values in both the base and comparison data sets equal to any value. By default, a missing value is only equal to a missing value of the same kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.
Alias: NOMISS
Interaction: Using NOMISSING is equivalent to using both NOMISSBASE and NOMISSCOMP.

NOPRINT
suppresses all printed output.
Tip: You may want to use this option when you are creating one or more output data sets.
Featured in: Comparing Values of Observations Using an Output Data Set (OUT=)

NOSUMMARY
suppresses the data set, variable, observation, and values comparison summary reports.
Tips: NOSUMMARY produces no output if there are no differences in the matching values.
Featured in: Comparing Variables in Different Data Sets

NOTE
displays notes in the SAS log describing the results of the comparison, whether or not differences were found.

NOVALUES
suppresses the report of the value comparison results.
Featured in: Overview

OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, PROC COMPARE creates it. SAS-data-set contains the differences between matching variables.
See also: Output Data Set (OUT=)
Featured in: Comparing Values of Observations Using an Output Data Set (OUT=)

OUTALL
writes an observation to the output data set for each observation in the base data set and for each observation in the comparison data set. The option also writes observations to the output data set containing the differences and percent differences between the values in matching observations.
Tip: Using OUTALL is equivalent to using the following four options: OUTBASE, OUTCOMP, OUTDIF, and OUTPERCENT.
See also: Output Data Set (OUT=)

OUTBASE
writes an observation to the output data set for each observation in the base data set, creating observations in which _TYPE_=BASE.
See also: Output Data Set (OUT=)
Featured in: Comparing Values of Observations Using an Output Data Set (OUT=)

OUTCOMP
writes an observation to the output data set for each observation in the comparison data set, creating observations in which _TYPE_=COMP.
See also: Output Data Set (OUT=)
Featured in: Comparing Values of Observations Using an Output Data Set (OUT=)

OUTDIF
writes an observation to the output data set for each pair of matching observations. The values in the observation include values for the differences between the values in the pair of observations. The value of _TYPE_ in each observation is DIF.
Default: The OUTDIF option is the default unless you specify the OUTBASE, OUTCOMP, or OUTPERCENT option. If you use any of these options, you must explicitly specify the OUTDIF option to create _TYPE_=DIF observations in the output data set.
See also: Output Data Set (OUT=)
Featured in: Comparing Values of Observations Using an Output Data Set (OUT=)

OUTNOEQUAL
suppresses the writing of an observation to the output data set when all values in the observation are judged equal. In addition, in observations containing values for some variables judged equal and others judged unequal, the OUTNOEQUAL option uses the special missing value ".E" to represent differences and percent differences for variables judged equal.
See also: Output Data Set (OUT=)
Featured in: Comparing Values of Observations Using an Output Data Set (OUT=)

OUTPERCENT
writes an observation to the output data set for each pair of matching observations. The values in the observation include values for the percent differences between the values in the pair of observations. The value of _TYPE_ in each observation is PERCENT.
See also: Output Data Set (OUT=)

OUTSTATS=SAS-data-set
writes summary statistics for all pairs of matching variables to the specified SAS-data-set.
Tip: If you want to print a table of statistics in the procedure output, use the STATS, ALLSTATS, or PRINTALL option.
See also: Output Statistics Data Set (OUTSTATS=) and Table of Summary Statistics .
Featured in: Creating an Output Data Set of Statistics (OUTSTATS=)

PRINTALL
invokes the following options: ALLVARS, ALLOBS, ALLSTATS, LISTALL, and WARNING.
Featured in: Producing a Complete Report of the Differences

STATS
prints a table of summary statistics for all pairs of matching numeric variables that are judged unequal.
See also: Table of Summary Statistics for information on the statistics produced.

TRANSPOSE
prints the reports of value differences by observation instead of by variable.
Interaction: If you also use the NOVALUES option, the TRANSPOSE option lists only the names of the variables whose values compare as unequal for each observation, not the values and differences.
See also: Comparison Results for Observations (Using the TRANSPOSE Option) .

WARNING
displays a warning message in the SAS log when differences are found.
Interaction: The ERROR option overrides the WARNING option.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.