In this document I begin by showing you examples of 1 sample and two sample procedures using SAS. Then I have several data sets which I describe and expect you to analyze. You must use SAS. I want you to hand in: copies of the SAS commands which you submit, the SAS output you get and a short (two or three sentences) summary of the practical conclusions. Uninterpreted computer output cannot get more than 25% of the possible marks. (At the the same time without the SAS input and output you won't get anything.)
One sample tests and confidence intervals
The data for this example are taken from question 42 in chapter 7 which you should see for an explanation of the setting. I ran the following SAS code which is in the file g:asbestos.sas which is called Macintosh HD:Student Folder:asbestos.sas on the Macs. (In order to duplicate this analysis on a MAC you must make a copies of the files CLASS:STAT:330:asbestos.sas and CLASS:STAT:330:asbestos.dat in the folder Macintosh HD:Student Folder or on your own floppy disk; the MAC version of SAS cannot read files from the folder CLASS:STAT:330 where the instructor normally puts files for students to use.)
options pagesize=60 linesize=80; data asbestos; infile 'g:asbestos.dat'; [infile 'Macintosh HD:Student Folder:asbestos.dat'; on the Macs] input comply; complyd=comply-200; proc means mean std stderr t prt maxdec=2; run;
The words mean, std, stderr, t, and prt after means in the proc means statement request the computation of the the sample mean, the sample standard deviation, the standard error of the mean, the value of the t statistic for testing the hypothesis of 0 mean and the two sided P-value for a t-test of that null hypothesis. The expression maxdec=2 limits the printout to 2 decimal places for means and such.
The output from proc means is
The SAS System 9
12:47 Thursday, October 12, 1995
Variable Mean Std Dev Std Error T Prob>|T|
----------------------------------------------------------------------
COMPLY 209.75 24.16 6.04 34.73 0.0001
COMPLYD 9.75 24.16 6.04 1.61 0.1273
----------------------------------------------------------------------
Notice that the second line tests the hypothesis that the mean of COMPLY
is actually 200. The two sided P value is about 13% indicating that
this there is only very weak evidence against this null. To compute a
95% confidence interval take
. I don't know if
I can get SAS to actually do this little piece of arithmetic easily.
Two sample tests and confidence intervals
The data for the question about Michelson's measurements of the speed of light from Assignment 4 are in the file g:michlson.dat which is called CLASS:STAT:330:michlson.dat on the Macs and I use proc ttest to test for no change in mean.
options pagesize=60 linesize=80; data michlson; infile 'g:michlson.dat'; [ infile 'Macintosh HD:Student Folder:michlson.dat'; on the Macs] input set $ speed ; proc sort data=michlson; by set; proc ttest cochran; class set; proc univariate plot normal; by set; run;
The output is
The SAS System 1
14:31 Monday, October 16, 1995
TTEST PROCEDURE
Variable: SPEED
SET N Mean Std Dev Std Error Minimum Maximum
-------------------------------------------------------------------------------
First 20 909.0000000 104.9260391 23.46217561 650.0000000 1070.000000
Second 20 831.5000000 54.2193401 12.12381302 740.0000000 950.000000
Variances T Method DF Prob>|T|
--------------------------------------------------------
Unequal 2.9346 Satterthwaite 28.5 0.0065
Cochran 19.0 0.0085
Equal 2.9346 38.0 0.0056
For H0: Variances are equal, F' = 3.75 DF = (19,19) Prob>F' = 0.0060
Notice that the two means are clearly different and that the two variances are
also clearly different. The ``Unequal'' line reports on tests which
try to adjust for unequal variances; Satterthwaite is the technique mentioned
in previous solution sets.
You have to do your own arithmetic to get confidence intervals.
The output of proc univariate is:
The SAS System 1
10:11 Wednesday, October 25, 1995
---------------------------------- SET=First -----------------------------------
Univariate Procedure
Variable=SPEED
Moments
N 20 Sum Wgts 20
Mean 909 Sum 18180
Std Dev 104.926 Variance 11009.47
Skewness -0.96461 Kurtosis 0.573188
USS 16734800 CSS 209180
CV 11.54302 Std Mean 23.46218
T:Mean=0 38.74321 Pr>|T| 0.0001
Num ^= 0 20 Num > 0 20
M(Sign) 10 Pr>=|M| 0.0001
Sgn Rank 105 Pr>=|S| 0.0001
W:Normal 0.920264 Pr<W 0.1059
Quantiles(Def=5)
100% Max 1070 99% 1070
75% Q3 980 95% 1035
50% Med 940 90% 1000
25% Q1 850 10% 750
0% Min 650 5% 695
1% 650
Range 420
Q3-Q1 130
Mode 980
Extremes
Lowest Obs Highest Obs
650( 14) 980( 12)
740( 2) 1000( 11)
760( 15) 1000( 17)
810( 16) 1000( 18)
850( 6) 1070( 4)
Stem Leaf # Boxplot
10 7 1 |
10 000 3 |
9 566888 6 +-----+
9 033 3 *--+--*
8 558 3 +-----+
8 1 1 |
7 6 1 |
7 4 1 |
6 5 1 0
----+----+----+----+
Multiply Stem.Leaf by 10**+2
The SAS System 2
10:11 Wednesday, October 25, 1995
---------------------------------- SET=First -----------------------------------
Univariate Procedure
Variable=SPEED
Normal Probability Plot
1075+ +++++*
| *+*++*
| ** *++*+
| ** ++++
875+ **+*+++
| +*+++
| ++++*
| ++++ *
675+ +++++*
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The SAS System 3
10:11 Wednesday, October 25, 1995
---------------------------------- SET=Second ----------------------------------
Univariate Procedure
Variable=SPEED
Moments
N 20 Sum Wgts 20
Mean 831.5 Sum 16630
Std Dev 54.21934 Variance 2939.737
Skewness 0.692545 Kurtosis 0.328607
USS 13883700 CSS 55855
CV 6.520666 Std Mean 12.12381
T:Mean=0 68.58403 Pr>|T| 0.0001
Num ^= 0 20 Num > 0 20
M(Sign) 10 Pr>=|M| 0.0001
Sgn Rank 105 Pr>=|S| 0.0001
W:Normal 0.934107 Pr<W 0.1953
Quantiles(Def=5)
100% Max 950 99% 950
75% Q3 870 95% 945
50% Med 810 90% 915
25% Q1 805 10% 770
0% Min 740 5% 750
1% 740
Range 210
Q3-Q1 65
Mode 810
Extremes
Lowest Obs Highest Obs
740( 14) 870( 12)
760( 5) 870( 20)
780( 3) 890( 1)
790( 7) 940( 16)
800( 18) 950( 17)
Stem Leaf # Boxplot
9 5 1 |
9 4 1 |
8 57779 5 +-----+
8 011111124 9 *--+--*
7 689 3 |
7 4 1 |
----+----+----+----+
Multiply Stem.Leaf by 10**+2
The SAS System 4
10:11 Wednesday, October 25, 1995
---------------------------------- SET=Second ----------------------------------
Univariate Procedure
Variable=SPEED
Normal Probability Plot
975+ * ++++
| +*+++++++
| *+*++*+*+
| **++*+*++**
| +*++*+*+++
725+ +++++*+++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The SAS System 5
10:11 Wednesday, October 25, 1995
Univariate Procedure
Schematic Plots
Variable=SPEED
|
1100 +
|
| |
| |
1050 + |
| |
| |
| |
1000 + |
| |
| +-----+
| | |
950 + | | |
| *-----* |
| | | |
| | + | |
900 + | | |
| | | |
| | | +-----+
| | | | |
850 + +-----+ | |
| | | + |
| | | |
| | *-----*
800 + | +-----+
| | |
| | |
| | |
750 + | |
| | |
|
|
700 +
|
|
|
650 + 0
------------+-----------+-----------
SET First Second
You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.
Two sample paired comparisons
You do this with proc means:
options pagesize=60 linesize=80;
data michpair;
infile 'g:michpair.dat';
[infile 'Macintosh HD:Student Folder:michpair.dat'; on the Macs]
input speed1 speed2 ;
diff=speed1-speed2
proc means mean std stderr t prt maxdec=2;
proc univariate plot normal;
var speed1 diff;
run;
The output is
The SAS System 2
14:31 Monday, October 16, 1995
Variable Mean Std Dev Std Error T Prob>|T|
--------------------------------------------------------------------------
SPEED1 909.00 104.93 23.46 38.74 0.0001
SPEED2 831.50 54.22 12.12 68.58 0.0001
DIFF 77.50 109.78 24.55 3.16 0.0052
--------------------------------------------------------------------------
Only the third line actually matters.
Your Assignment
options pagesize=60 linesize=80;
data glucose;
infile 'g:glucose.dat';
[infile 'Macintosh HD:Student Folder:glucose.dat'; on the MAC]
input frstpreg scndpreg ;
proc print;
run;
DUE: Wednesday 6 November 1996