![]() Chapter Contents |
![]() Previous |
![]() Next |
| The COMPARE Procedure |
| SAS Log |
| Macro Return Codes (SYSINFO) |
Macro Return Codes is a key for interpreting the SYSINFO return code from PROC COMPARE. For each of the conditions listed, the associated value is added to the return code if the condition is true. Thus, the SYSINFO return code is the sum of the codes listed in Macro Return Codes for the applicable conditions:
| Bit | Condition | Code | Hex | Description | |
|---|---|---|---|---|---|
| 1 | DSLABEL | 1 | 0001X | Data set labels differ | |
| 2 | DSTYPE | 2 | 0002X | Data set types differ | |
| 3 | INFORMAT | 4 | 0004X | Variable has different informat | |
| 4 | FORMAT | 8 | 0008X | Variable has different format | |
| 5 | LENGTH | 16 | 0010X | Variable has different length | |
| 6 | LABEL | 32 | 0020X | Variable has different label | |
| 7 | BASEOBS | 64 | 0040X | Base data set has observation not in comparison | |
| 8 | COMPOBS | 128 | 0080X | Comparison data set has observation not in base | |
| 9 | BASEBY | 256 | 0100X | Base data set has BY group not in comparison | |
| 10 | COMPBY | 512 | 0200X | Comparison data set has BY group not in base | |
| 11 | BASEVAR | 1024 | 0400X | Base data set has variable not in comparison | |
| 12 | COMPVAR | 2048 | 0800X | Comparison data set has variable not in base | |
| 13 | VALUE | 4096 | 1000X | A value comparison was unequal | |
| 14 | TYPE | 8192 | 2000X | Conflicting variable types | |
| 15 | BYVAR | 16384 | 4000X | BY variables do not match | |
| 16 | ERROR | 32768 | 8000X | Fatal error: comparison not done | |
These codes are ordered and scaled to allow a simple check of the degree to which the data sets differ. For example, if you want to check that two data sets contain the same variables, observations, and values, but you do not care about differences in labels, formats, and so forth, use the following statements:
proc compare base=SAS-data-set
compare=SAS-data-set;
run;
%if &sysinfo >= 64 %then
%do;
handle error;
%end;
You can examine individual bits in the SYSINFO value by using DATA step bit-testing features to check for specific conditions. For example, to check for the presence of observations in the base data set that are not in the comparison data set, use the following statements:
proc compare base=SAS-data-set
compare=SAS-data-set;
run;
%let rc=&sysinfo;
data _null_;
if &rc='1......'b then
put 'Observations in Base but not
in Comparison Data Set';
run;
PROC COMPARE must run before you check SYSINFO and you must obtain the SYSINFO value before another SAS step starts because every SAS step resets SYSINFO.
| Procedure Output |
Partial Output shows the Data Set Summary.
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Data Set Summary
Dataset Created Modified NVar NObs Label
PROCLIB.ONE 11SEP97:15:11:07 11SEP97:15:11:09 5 4 First Data Set
PROCLIB.TWO 11SEP97:15:11:10 11SEP97:15:11:10 6 5 Second Data Set |
The second part of the report lists matching variables with different attributes and shows how the attributes differ. (The COMPARE procedure omits variable labels if the line size is too small for them.)
Partial Output shows the Variables Summary.
Variables Summary
Number of Variables in Common: 5.
Number of Variables in PROCLIB.TWO but not in PROCLIB.ONE: 1.
Number of Variables with Conflicting Types: 1.
Number of Variables with Differing Attributes: 3.
Listing of Common Variables with Conflicting Types
Variable Dataset Type Length
student PROCLIB.ONE Num 8
PROCLIB.TWO Char 8
Listing of Common Variables with Differing Attributes
Variable Dataset Type Length Format Label
year PROCLIB.ONE Char 8 Year of Birth
PROCLIB.TWO Char 8
state PROCLIB.ONE Char 8
PROCLIB.TWO Char 8 Home State
gr1 PROCLIB.ONE Num 8 4.1
PROCLIB.TWO Num 8 5.2
|
Partial Output shows the Observation Summary.
Observation Summary
Observation Base Compare
First Obs 1 1
First Unequal 1 1
Last Unequal 4 4
Last Match 4 4
Last Obs . 5
Number of Observations in Common: 4.
Number of Observations in PROCLIB.TWO but not in PROCLIB.ONE: 1.
Total Number of Observations Read from PROCLIB.ONE: 4.
Total Number of Observations Read from PROCLIB.TWO: 5.
Number of Observations with Some Compared Variables Unequal: 4.
Number of Observations with All Compared Variables Equal: 0. |
In addition, for the variables for which some matching observations have unequal values, the report lists
Partial Output shows the Values Comparison Summary.
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 1.
Number of Variables Compared with Some Observations Unequal: 3.
Total Number of Values which Compare Unequal: 6.
Maximum Difference: 20.
Variables with Unequal Values
Variable Type Len Compare Label Ndif MaxDif
state CHAR 8 Home State 2
gr1 NUM 8 2 1.000
gr2 NUM 8 2 20.000 |
Value Comparison Results for Variables
__________________________________________________________
|| Home State
|| Base Value Compare Value
Obs || state state
________ || ________ ________
||
2 || MD MA
4 || MA MD
__________________________________________________________
__________________________________________________________
|| Base Compare
Obs || gr1 gr1 Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 85.0 84.00 -1.0000 -1.1765
3 || 78.0 79.00 1.0000 1.2821
__________________________________________________________
__________________________________________________________
|| Base Compare
Obs || gr2 gr2 Diff. % Diff
________ || _________ _________ _________ _________
||
3 || 72.0000 73.0000 1.0000 1.3889
4 || 94.0000 74.0000 -20.0000 -21.2766
__________________________________________________________ |
You can suppress the value comparison results with the NOVALUES option.
If you use both the NOVALUES and TRANSPOSE options, PROC COMPARE lists for
each observation the names of the variables with values judged unequal but
does not display the values and differences.
Note: In all cases PROC COMPARE calculates the summary statistics based
on all matching observations that do not contain missing values, not just
on those containing unequal values. ![[cautionend]](../common/images/cautend.gif)
Partial Output shows the following summary statistics for base data set
values, comparison
data set values, differences, and percent differences:
Partial Output is from the ALLSTATS option using the two data sets shown in "Overview":
Value Comparison Results for Variables
__________________________________________________________
|| Base Compare
Obs || gr1 gr1 Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 85.0 84.00 -1.0000 -1.1765
3 || 78.0 79.00 1.0000 1.2821
________ || _________ _________ _________ _________
||
N || 4 4 4 4
Mean || 85.5000 85.5000 0 0.0264
Std || 5.8023 5.4467 0.8165 1.0042
Max || 92.0000 92.0000 1.0000 1.2821
Min || 78.0000 79.0000 -1.0000 -1.1765
StdErr || 2.9011 2.7234 0.4082 0.5021
t || 29.4711 31.3951 0.0000 0.0526
Prob>|t| || <.0001 <.0001 1.0000 0.9614
||
Ndif || 2 50.000%
DifMeans || 0.000% 0.000% 0
r, rsq || 0.991 0.983
__________________________________________________________
__________________________________________________________
|| Base Compare
Obs || gr2 gr2 Diff. % Diff
________ || _________ _________ _________ _________
||
3 || 72.0000 73.0000 1.0000 1.3889
4 || 94.0000 74.0000 -20.0000 -21.2766
________ || _________ _________ _________ _________
||
N || 4 4 4 4
Mean || 86.2500 81.5000 -4.7500 -4.9719
Std || 9.9457 9.4692 10.1776 10.8895
Max || 94.0000 92.0000 1.0000 1.3889
Min || 72.0000 73.0000 -20.0000 -21.2766
StdErr || 4.9728 4.7346 5.0888 5.4447
t || 17.3442 17.2136 -0.9334 -0.9132
Prob>|t| || 0.0004 0.0004 0.4195 0.4285
||
Ndif || 2 50.000%
DifMeans || -5.507% -5.828% -4.7500
r, rsq || 0.451 0.204
__________________________________________________________ |
Note: If you use a wide line size with PRINTALL, PROC COMPARE prints
the value comparison result for character variables next to the result for
numeric variables. In that case, PROC COMPARE calculates only NDIF for the
character variables. ![[cautionend]](../common/images/cautend.gif)
_OBS_1=number-1 _OBS_2=number-2where number-1 is the number of the observation in the base data set for which the value of the variable is shown, and number-2 is the number of the observation in the comparison data set.
Partial Output shows the differences in PROCLIB.ONE and PROCLIB.TWO by observation instead of by variable.
Comparison Results for Observations
_OBS_1=1 _OBS_2=1:
Variable Base Value Compare Diff. % Diff
gr1 85.0 84.00 -1.000000 -1.176471
_OBS_1=2 _OBS_2=2:
Variable Base Value Compare
state MD MA
_OBS_1=3 _OBS_2=3:
Variable Base Value Compare Diff. % Diff
gr1 78.0 79.00 1.000000 1.282051
gr2 72.000000 73.000000 1.000000 1.388889
_OBS_1=4 _OBS_2=4:
Variable Base Value Compare Diff. % Diff
gr2 94.000000 74.000000 -20.000000 -21.276596
state MA MD |
If you use an ID statement, the identifying label has the following form:
ID-1=ID-value-1 ... ID-n=ID-value-nwhere ID is the name of an ID variable and ID-value is the value of the ID variable.
Note: When you use the TRANSPOSE option, PROC COMPARE prints only the
first 12 characters of the value. ![[cautionend]](../common/images/cautend.gif)
| Output Data Set (OUT=) |
In addition, the data set contains two variables created by PROC COMPARE to identify the source of the values for the matching variables: _TYPE_ and _OBS_.
Type of Observation. The four possible
values of this variable are as follows:
For observations with _TYPE_ equal to BASE, _OBS_ is the number of the observation in the base data set from which the values of the VAR variables were copied. Similarly, for observations with _TYPE_ equal to COMPARE, _OBS_ is the number of the observation in the comparison data set from which the values of the VAR variables were copied.
For observations with _TYPE_ equal to DIF or PERCENT, _OBS_ is a sequence number that counts the matching observations in the BY group.
_OBS_ has the label
Observation Number.
The COMPARE procedure takes variable names and attributes for the OUT= data set from the base data set except for the lengths of ID and VAR variables, for which it uses the longer length regardless of which data set that length is from. This behavior has two important repercussions:
BASE contain the values of the VAR variables, while
observations with _TYPE_ equal to
COMPARE contain the values of the
WITH variables.
| Output Statistics Data Set (OUTSTATS=) |
N,
MEAN,
STD,
MIN,
MAX,
STDERR,
T,
PROBT,
NDIF,
DIFMEANS, and
R,
RSQ.Note: For both types of output data sets, PROC COMPARE assigns one of the following data set labels:
Comparison of base-SAS-data-set with comparison-SAS-data-set Comparison of variables in base-SAS-data-set
![[cautionend]](../common/images/cautend.gif)
See Creating an Output Data Set of Statistics (OUTSTATS=) for an example of an OUTSTATS= data set.
![]() Chapter Contents |
![]() Previous |
![]() Next |
![]() Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.