STAT 330
Assignment 7: First SAS Assignment
In this handout I begin by showing you examples of 1 sample and two sample procedures using SAS. Then I have several data sets which I describe and expect you to analyze. You must use SAS. I want you to hand in: copies of the SAS commands which you submit, the SAS output you get and a short (two or three sentences) summary of the practical conclusions. Uninterpreted computer output cannot get more than 25% of the possible marks. (At the the same time without the SAS input and output you won't get anything.)
One sample tests and confidence intervals
The data for this example are taken from question 42 in chapter 7 which you
should see for an explanation of the setting.
I ran the following SAS code which is in the file n:
stat
330
asbestos.sas.
options pagesize=60 linesize=80; data asbestos; infile 'n:\stat\330\asbestos.dat'; input comply; complyd=comply-200; proc means mean std stderr t prt maxdec=2; run;
The words mean, std, stderr, t, and prt after means in the proc means statement request the computation of the the sample mean, the sample standard deviation, the standard error of the mean, the value of the t statistic for testing the hypothesis of 0 mean and the two sided P-value for a t-test of that null hypothesis. The expression maxdec=2 limits the printout to 2 decimal places for means and such.
The output from proc means is
The SAS System 9
12:47 Thursday, October 12, 1995
Variable Mean Std Dev Std Error T Prob>|T|
----------------------------------------------------------------------
COMPLY 209.75 24.16 6.04 34.73 0.0001
COMPLYD 9.75 24.16 6.04 1.61 0.1273
----------------------------------------------------------------------
Notice that the second line tests the hypothesis that the mean of COMPLY
is actually 200. The two sided P value is about 13% indicating that
this there is only very weak evidence against this null. To compute a
95% confidence interval take
Two sample tests and confidence intervals
The data for the question about Michelson's measurements of the speed
of light from Assignment 4 are the file n:
stat
330
michlson.dat and I
use proc ttest to test for no change in mean.
options pagesize=60 linesize=80; data michlson; infile 'n:\stat\330\michlson.dat'; input set $ speed ; proc sort data=michlson; by set; proc ttest cochran; class set; var speed; run; proc univariate plot normal; by set; run;To get high definition graphs replace the proc univariate with
proc rank data=michlson normal=vw out=rmich; by set; var speed; ranks normscr; proc gplot data =rmich; title3 'normal probability plot'; by set; plot speed*normscr; run;The (low resolution) output is
The SAS System 1
14:31 Monday, October 16, 1995
TTEST PROCEDURE
Variable: SPEED
SET N Mean Std Dev Std Error Minimum Maximum
-------------------------------------------------------------------------------
First 20 909.0000000 104.9260391 23.46217561 650.0000000 1070.000000
Second 20 831.5000000 54.2193401 12.12381302 740.0000000 950.000000
Variances T Method DF Prob>|T|
--------------------------------------------------------
Unequal 2.9346 Satterthwaite 28.5 0.0065
Cochran 19.0 0.0085
Equal 2.9346 38.0 0.0056
For H0: Variances are equal, F' = 3.75 DF = (19,19) Prob>F' = 0.0060
The line labelled "Equal" gives the usual two sample t statistic for testing for
equal means using a pooled estimate of the variance. It shows the degrees of
freedom and the associated two tailed P-value. Beneath that line is a line beginning
For H0 which tests the hypothesis that the two variances are equal. The statistic
F' is the larger sample variance over the smaller and the P value is two tailed.
Notice that the two means are clearly different and that the two variances are
also clearly different. The ``Unequal'' line reports on tests which
try to adjust for unequal variances; Satterthwaite is the technique mentioned
in previous solution sets.
You have to do your own arithmetic to get confidence intervals.
The output of proc univariate is:
The SAS System 1
10:11 Wednesday, October 25, 1995
----------------------------- SET=First -----------------------------------
Univariate Procedure
Variable=SPEED
Moments
N 20 Sum Wgts 20
Mean 909 Sum 18180
Std Dev 104.926 Variance 11009.47
Skewness -0.96461 Kurtosis 0.573188
USS 16734800 CSS 209180
CV 11.54302 Std Mean 23.46218
T:Mean=0 38.74321 Pr>|T| 0.0001
Num ^= 0 20 Num > 0 20
M(Sign) 10 Pr>=|M| 0.0001
Sgn Rank 105 Pr>=|S| 0.0001
W:Normal 0.920264 Pr<W 0.1059
Quantiles(Def=5)
100% Max 1070 99% 1070
75% Q3 980 95% 1035
50% Med 940 90% 1000
25% Q1 850 10% 750
0% Min 650 5% 695
1% 650
Range 420
Q3-Q1 130
Mode 980
Extremes
Lowest Obs Highest Obs
650( 14) 980( 12)
740( 2) 1000( 11)
760( 15) 1000( 17)
810( 16) 1000( 18)
850( 6) 1070( 4)
Stem Leaf # Boxplot
10 7 1 |
10 000 3 |
9 566888 6 +-----+
9 033 3 *--+--*
8 558 3 +-----+
8 1 1 |
7 6 1 |
7 4 1 |
6 5 1 0
----+----+----+----+
Multiply Stem.Leaf by 10**+2
The SAS System 2
10:11 Wednesday, October 25, 1995
---------------------------------- SET=First -----------------------------------
Univariate Procedure
Variable=SPEED
Normal Probability Plot
1075+ +++++*
| *+*++*
| ** *++*+
| ** ++++
875+ **+*+++
| +*+++
| ++++*
| ++++ *
675+ +++++*
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The SAS System 3
10:11 Wednesday, October 25, 1995
---------------------------------- SET=Second ----------------------------------
Univariate Procedure
Variable=SPEED
Moments
N 20 Sum Wgts 20
Mean 831.5 Sum 16630
Std Dev 54.21934 Variance 2939.737
Skewness 0.692545 Kurtosis 0.328607
USS 13883700 CSS 55855
CV 6.520666 Std Mean 12.12381
T:Mean=0 68.58403 Pr>|T| 0.0001
Num ^= 0 20 Num > 0 20
M(Sign) 10 Pr>=|M| 0.0001
Sgn Rank 105 Pr>=|S| 0.0001
W:Normal 0.934107 Pr<W 0.1953
Quantiles(Def=5)
100% Max 950 99% 950
75% Q3 870 95% 945
50% Med 810 90% 915
25% Q1 805 10% 770
0% Min 740 5% 750
1% 740
Range 210
Q3-Q1 65
Mode 810
Extremes
Lowest Obs Highest Obs
740( 14) 870( 12)
760( 5) 870( 20)
780( 3) 890( 1)
790( 7) 940( 16)
800( 18) 950( 17)
Stem Leaf # Boxplot
9 5 1 |
9 4 1 |
8 57779 5 +-----+
8 011111124 9 *--+--*
7 689 3 |
7 4 1 |
----+----+----+----+
Multiply Stem.Leaf by 10**+2
The SAS System 4
10:11 Wednesday, October 25, 1995
---------------------------------- SET=Second ----------------------------------
Univariate Procedure
Variable=SPEED
Normal Probability Plot
975+ * ++++
| +*+++++++
| *+*++*+*+
| **++*+*++**
| +*++*+*+++
725+ +++++*+++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The SAS System 5
10:11 Wednesday, October 25, 1995
Univariate Procedure
Schematic Plots
Variable=SPEED
|
1100 +
|
| |
| |
1050 + |
| |
| |
| |
1000 + |
| |
| +-----+
| | |
950 + | | |
| *-----* |
| | | |
| | + | |
900 + | | |
| | | |
| | | +-----+
| | | | |
850 + +-----+ | |
| | | + |
| | | |
| | *-----*
800 + | +-----+
| | |
| | |
| | |
750 + | |
| | |
|
|
700 +
|
|
|
650 + 0
------------+-----------+-----------
SET First Second
You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.
Two sample paired comparisons
You do this with proc means:
options pagesize=60 linesize=80;
data michpair;
infile 'n:\stat\330\michpair.dat';
input speed1 speed2 ;
diff=speed1-speed2
proc means mean std stderr t prt maxdec=2;
proc univariate plot normal;
var speed1 diff;
run;
The output is
The SAS System 2
14:31 Monday, October 16, 1995
Variable Mean Std Dev Std Error T Prob>|T|
--------------------------------------------------------------------------
SPEED1 909.00 104.93 23.46 38.74 0.0001
SPEED2 831.50 54.22 12.12 68.58 0.0001
DIFF 77.50 109.78 24.55 3.16 0.0052
--------------------------------------------------------------------------
Only the third line actually matters.
options pagesize=60 linesize=80; data glucose; infile 'n:\stat\330\glucose.dat'; input frstpreg scndpreg ; proc print; run;
DUE: Friday end of Week 9