STAT 330

Assignment 7: First SAS Assignment

In this handout I begin by showing you examples of 1 sample and two sample procedures using SAS. Then I have several data sets which I describe and expect you to analyze. You must use SAS. I want you to hand in: copies of the SAS commands which you submit, the SAS output you get and a short (two or three sentences) summary of the practical conclusions. Uninterpreted computer output cannot get more than 25% of the possible marks. (At the the same time without the SAS input and output you won't get anything.)

One sample tests and confidence intervals

The data for this example are taken from question 42 in chapter 7 which you should see for an explanation of the setting. I ran the following SAS code which is in the file n: stat 330 asbestos.sas.

```  options pagesize=60 linesize=80;
data asbestos;
infile 'n:\stat\330\asbestos.dat';
input comply;
complyd=comply-200;
proc means mean std stderr t prt maxdec=2;
run;```

The words mean, std, stderr, t, and prt after means in the proc means statement request the computation of the the sample mean, the sample standard deviation, the standard error of the mean, the value of the t statistic for testing the hypothesis of 0 mean and the two sided P-value for a t-test of that null hypothesis. The expression maxdec=2 limits the printout to 2 decimal places for means and such.

The output from proc means is

```                        The SAS System                                9
12:47 Thursday, October 12, 1995

Variable        Mean       Std Dev     Std Error          T  Prob>|T|
----------------------------------------------------------------------
COMPLY        209.75        24.16         6.04         34.73    0.0001
COMPLYD         9.75        24.16         6.04          1.61    0.1273
----------------------------------------------------------------------```
Notice that the second line tests the hypothesis that the mean of COMPLY is actually 200. The two sided P value is about 13% indicating that this there is only very weak evidence against this null. To compute a 95% confidence interval take . I don't know if I can get SAS to actually do this little piece of arithmetic easily.

Two sample tests and confidence intervals

The data for the question about Michelson's measurements of the speed of light from Assignment 4 are the file n: stat 330 michlson.dat and I use proc ttest to test for no change in mean.

```  options pagesize=60 linesize=80;
data michlson;
infile 'n:\stat\330\michlson.dat';
input set \$ speed ;
proc sort data=michlson;
by set;
proc ttest cochran;
class set;
var speed;
run;
proc univariate plot normal;
by set;
run;```
To get high definition graphs replace the proc univariate with
```proc rank data=michlson normal=vw out=rmich;
by set;
var speed;
ranks normscr;
proc gplot data =rmich;
title3 'normal probability plot';
by set;
plot speed*normscr;
run;```
The (low resolution) output is
```                            The SAS System                                1
14:31 Monday, October 16, 1995

TTEST PROCEDURE

Variable: SPEED

SET          N         Mean      Std Dev    Std Error      Minimum      Maximum
-------------------------------------------------------------------------------
First       20  909.0000000  104.9260391  23.46217561  650.0000000  1070.000000
Second      20  831.5000000   54.2193401  12.12381302  740.0000000   950.000000

Variances        T    Method              DF    Prob>|T|
--------------------------------------------------------
Unequal     2.9346    Satterthwaite     28.5      0.0065
Cochran           19.0      0.0085
Equal       2.9346                      38.0      0.0056

For H0: Variances are equal, F' = 3.75    DF = (19,19)    Prob>F' = 0.0060```
The line labelled "Equal" gives the usual two sample t statistic for testing for equal means using a pooled estimate of the variance. It shows the degrees of freedom and the associated two tailed P-value. Beneath that line is a line beginning For H0 which tests the hypothesis that the two variances are equal. The statistic F' is the larger sample variance over the smaller and the P value is two tailed. Notice that the two means are clearly different and that the two variances are also clearly different. The ``Unequal'' line reports on tests which try to adjust for unequal variances; Satterthwaite is the technique mentioned in previous solution sets. You have to do your own arithmetic to get confidence intervals. The output of proc univariate is:
```                            The SAS System                                1
10:11 Wednesday, October 25, 1995

----------------------------- SET=First -----------------------------------

Univariate Procedure

Variable=SPEED

Moments

N                20  Sum Wgts         20
Mean            909  Sum           18180
Std Dev     104.926  Variance   11009.47
Skewness   -0.96461  Kurtosis   0.573188
USS        16734800  CSS          209180
CV         11.54302  Std Mean   23.46218
T:Mean=0   38.74321  Pr>|T|       0.0001
Num ^= 0         20  Num > 0          20
M(Sign)          10  Pr>=|M|      0.0001
Sgn Rank        105  Pr>=|S|      0.0001
W:Normal   0.920264  Pr<W         0.1059

Quantiles(Def=5)

100% Max      1070       99%      1070
75% Q3        980       95%      1035
50% Med       940       90%      1000
25% Q1        850       10%       750
0% Min       650        5%       695
1%       650
Range          420
Q3-Q1          130
Mode           980

Extremes

Lowest    Obs     Highest    Obs
650(      14)      980(      12)
740(       2)     1000(      11)
760(      15)     1000(      17)
810(      16)     1000(      18)
850(       6)     1070(       4)

Stem Leaf                     #             Boxplot
10 7                        1                |
10 000                      3                |
9 566888                   6             +-----+
9 033                      3             *--+--*
8 558                      3             +-----+
8 1                        1                |
7 6                        1                |
7 4                        1                |
6 5                        1                0
----+----+----+----+
Multiply Stem.Leaf by 10**+2

The SAS System                                2
10:11 Wednesday, October 25, 1995

---------------------------------- SET=First -----------------------------------

Univariate Procedure

Variable=SPEED

Normal Probability Plot
1075+                                       +++++*
|                                  *+*++*
|                          ** *++*+
|                      ** ++++
875+                  **+*+++
|               +*+++
|          ++++*
|      ++++ *
675+ +++++*
+----+----+----+----+----+----+----+----+----+----+
-2        -1         0        +1        +2

The SAS System                                3
10:11 Wednesday, October 25, 1995

---------------------------------- SET=Second ----------------------------------

Univariate Procedure

Variable=SPEED

Moments

N                20  Sum Wgts         20
Mean          831.5  Sum           16630
Std Dev    54.21934  Variance   2939.737
Skewness   0.692545  Kurtosis   0.328607
USS        13883700  CSS           55855
CV         6.520666  Std Mean   12.12381
T:Mean=0   68.58403  Pr>|T|       0.0001
Num ^= 0         20  Num > 0          20
M(Sign)          10  Pr>=|M|      0.0001
Sgn Rank        105  Pr>=|S|      0.0001
W:Normal   0.934107  Pr<W         0.1953

Quantiles(Def=5)

100% Max       950       99%       950
75% Q3        870       95%       945
50% Med       810       90%       915
25% Q1        805       10%       770
0% Min       740        5%       750
1%       740
Range          210
Q3-Q1           65
Mode           810

Extremes

Lowest    Obs     Highest    Obs
740(      14)      870(      12)
760(       5)      870(      20)
780(       3)      890(       1)
790(       7)      940(      16)
800(      18)      950(      17)

Stem Leaf                     #             Boxplot
9 5                        1                |
9 4                        1                |
8 57779                    5             +-----+
8 011111124                9             *--+--*
7 689                      3                |
7 4                        1                |
----+----+----+----+
Multiply Stem.Leaf by 10**+2

The SAS System                                4
10:11 Wednesday, October 25, 1995

---------------------------------- SET=Second ----------------------------------

Univariate Procedure

Variable=SPEED

Normal Probability Plot
975+                                            *  ++++
|                                      +*+++++++
|                             *+*++*+*+
|                  **++*+*++**
|          +*++*+*+++
725+ +++++*+++
+----+----+----+----+----+----+----+----+----+----+
-2        -1         0        +1        +2

The SAS System                                5
10:11 Wednesday, October 25, 1995

Univariate Procedure
Schematic Plots

Variable=SPEED

|
1100 +
|
|            |
|            |
1050 +            |
|            |
|            |
|            |
1000 +            |
|            |
|         +-----+
|         |     |
950 +         |     |        |
|         *-----*        |
|         |     |        |
|         |  +  |        |
900 +         |     |        |
|         |     |        |
|         |     |     +-----+
|         |     |     |     |
850 +         +-----+     |     |
|            |        |  +  |
|            |        |     |
|            |        *-----*
800 +            |        +-----+
|            |           |
|            |           |
|            |           |
750 +            |           |
|            |           |
|
|
700 +
|
|
|
650 +            0
------------+-----------+-----------
SET             First      Second```

You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.

Two sample paired comparisons

You do this with proc means:

```  options pagesize=60 linesize=80;
data michpair;
infile 'n:\stat\330\michpair.dat';
input speed1 speed2 ;
diff=speed1-speed2
proc means mean std stderr t prt maxdec=2;
proc univariate plot normal;
var speed1 diff;
run;```
The output is
```                                 The SAS System                                2
14:31 Monday, October 16, 1995

Variable          Mean       Std Dev     Std Error             T  Prob>|T|
--------------------------------------------------------------------------
SPEED1          909.00        104.93         23.46         38.74    0.0001
SPEED2          831.50         54.22         12.12         68.58    0.0001
DIFF             77.50        109.78         24.55          3.16    0.0052
--------------------------------------------------------------------------```

Only the third line actually matters.

1. The file n: stat 330 glucose.dat contains blood glucose levels for 52 women after their first pregnancy and then their second. The following SAS commands read the file and print out the data set.
```  options pagesize=60 linesize=80;
data glucose;
infile 'n:\stat\330\glucose.dat';
input frstpreg scndpreg ;
proc print;
run;```

1. Get 95% confidence intervals for first pregnancy mean, second pregnancy mean and difference in means.
2. Is there a difference in blood glucose levels between the two pregnancies?
3. Does the population look reasonably normal?

2. For the body fat data in the introductory handout on SAS do men and women have different average percent body fat? Do they have different population standard deviations? Are the normality assumptions adequate? (The data are in n: stat 330 bodyfat.dat.)
3. In the file n: stat 330 iris.dat are the measurements of 4 dimensions on each of 50 flowers of 2 species of iris. Read them with input species \$ sepallen; -- the file has 3 other columns which are ignored by this command. Do Versicolor and Virginica Irises have different average sepal lengths?

DUE: Friday end of Week 9

Richard Lockhart
Fri Mar 6 15:06:27 PST 1998