Example 67.1: Comparing Group Means Using Input Data Set of Summary Statistics
The following example, taken from
Huntsberger and Billingsley (1989), compares two grazing
methods using 32 steer. Half of the steer are
allowed to graze continuously while the other half are
subjected to controlled grazing time. The researchers
want to know if these two grazing methods impact weight
gain differently. The data are read by the following
DATA step.
title 'Group Comparison Using Input Data Set of Summary
Statistics';
data graze;
length GrazeType $ 10;
input GrazeType $ WtGain @@;
datalines;
controlled 45 controlled 62
controlled 96 controlled 128
controlled 120 controlled 99
controlled 28 controlled 50
controlled 109 controlled 115
controlled 39 controlled 96
controlled 87 controlled 100
controlled 76 controlled 80
continuous 94 continuous 12
continuous 26 continuous 89
continuous 88 continuous 96
continuous 85 continuous 130
continuous 75 continuous 54
continuous 112 continuous 69
continuous 104 continuous 95
continuous 53 continuous 21
;
run;
The variable GrazeType denotes the grazing method:
`controlled' is controlled grazing and `continuous' is continuous grazing.
The dollar sign ($) following GrazeType makes it a
character variable, and the trailing at signs
(@@) tell the procedure that there is more than one observation per line.
The MEANS procedure is invoked to create a data set of summary statistics
with the following statements:
proc sort;
by GrazeType;
proc means data=graze noprint;
var WtGain;
by GrazeType;
output out=newgraze;
run;
The NOPRINT option eliminates all output from the MEANS
procedure. The VAR statement tells PROC MEANS to compute summary
statistics for the WtGain variable, and the BY statement requests
a separate set of summary statistics for each level of GrazeType.
The OUTPUT OUT= statement tells PROC MEANS to put the summary
statistics into a data set called newgraze so that it may be used
in subsequent procedures. This new data set is displayed in
Output 67.1.1 by using PROC PRINT as follows:
proc print data=newgraze;
run;
The _STAT_ variable contains the names of the statistics,
and the GrazeType variable indicates which group the statistic is from.
Output 67.1.1: Output Data Set of Summary Statistics
Group Comparison Using Input Data Set of Summary Statistics |
Obs |
GrazeType |
_TYPE_ |
_FREQ_ |
_STAT_ |
WtGain |
1 |
continuous |
0 |
16 |
N |
16.000 |
2 |
continuous |
0 |
16 |
MIN |
12.000 |
3 |
continuous |
0 |
16 |
MAX |
130.000 |
4 |
continuous |
0 |
16 |
MEAN |
75.188 |
5 |
continuous |
0 |
16 |
STD |
33.812 |
6 |
controlled |
0 |
16 |
N |
16.000 |
7 |
controlled |
0 |
16 |
MIN |
28.000 |
8 |
controlled |
0 |
16 |
MAX |
128.000 |
9 |
controlled |
0 |
16 |
MEAN |
83.125 |
10 |
controlled |
0 |
16 |
STD |
30.535 |
|
The following code invokes PROC TTEST using the newgraze
data set, as denoted by the DATA= option.
proc ttest data=newgraze;
class GrazeType;
var WtGain;
run;
The CLASS statement contains the variable that distinguishes
between the groups being compared, in this case GrazeType.
The summary statistics and confidence intervals are displayed first,
as shown in Output 67.1.2.
Output 67.1.2: Summary Statistics
Group Comparison Using Input Data Set of Summary Statistics |
Statistics |
Variable |
Class |
N |
Lower CL Mean |
Mean |
Upper CL Mean |
Lower CL Std Dev |
Std Dev |
Upper CL Std Dev |
Std Err |
Minimum |
Maximum |
WtGain |
continuous |
16 |
57.171 |
75.188 |
93.204 |
. |
33.812 |
. |
8.4529 |
12 |
130 |
WtGain |
controlled |
16 |
66.854 |
83.125 |
99.396 |
. |
30.535 |
. |
7.6337 |
28 |
128 |
WtGain |
Diff (1-2) |
|
-31.2 |
-7.938 |
15.323 |
25.743 |
32.215 |
43.061 |
11.39 |
|
|
|
In Output 67.1.2,
the Variable column states the variable used in computations
and the Class column specifies the group for which the statistics
are computed.
For each class, the sample size, mean, standard deviation and standard
error, and maximum and minimum values are displayed. The confidence
bounds for the mean are also displayed; however, since summary statistics
are used as input, the confidence bounds for the standard deviation of
the groups are not calculated.
Output 67.1.3: t Tests
Group Comparison Using Input Data Set of Summary Statistics |
T-Tests |
Variable |
Method |
Variances |
DF |
t Value |
Pr > |t| |
WtGain |
Pooled |
Equal |
30 |
-0.70 |
0.4912 |
WtGain |
Satterthwaite |
Unequal |
29.7 |
-0.70 |
0.4913 |
Equality of Variances |
Variable |
Method |
Num DF |
Den DF |
F Value |
Pr > F |
WtGain |
Folded F |
15 |
15 |
1.23 |
0.6981 |
|
Output 67.1.3 shows the results of tests for
equal group means and equal variances.
A group test statistic for the equality of means is reported
for equal and unequal variances. Before deciding which test
is appropriate, you should look at the test for
equality of variances; this test does not indicate a significant
difference in the two variances (F' = 1.23, p = 0.6981), so the
pooled t statistic should be used. Based on the pooled
statistic, the two grazing methods are not significantly
different (t=0.70, p=0.4912). Note that this test assumes
that the observations in both data sets are normally distributed;
this assumption can be checked in PROC UNIVARIATE using the raw data.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.