**STAT 330
**

Assignment 7: First SAS Assignment

In this handout I begin by showing you examples of 1 sample and two sample
procedures using SAS. Then I have several data sets which I describe and expect
you to analyze. You must use SAS. I want you to hand in: copies of the
SAS commands which you submit, the SAS output you get *and* a
short (two or three sentences) summary of the practical conclusions.
Uninterpreted computer output cannot get more than 25% of
the possible marks. (At the the same time without the SAS input and
output you won't get anything.)

**One sample tests and confidence intervals**

The data for this example are taken from question 42 in chapter 7 which you
should see for an explanation of the setting.
I ran the following SAS code which is in the file *n: stat 330 asbestos.sas*.

options pagesize=60 linesize=80; data asbestos; infile 'n:\stat\330\asbestos.dat'; input comply; complyd=comply-200; proc means mean std stderr t prt maxdec=2; run;

The words mean, std, stderr, t, and prt after means in the proc means
statement request the computation of the the sample mean, the sample
standard deviation, the standard error of the mean, the value of the
*t* statistic for testing the hypothesis of 0 mean and the two sided
*P*-value for a *t*-test of that null hypothesis. The expression
maxdec=2 limits the printout to 2 decimal places for means and such.

The output from proc means is

The SAS System 9 12:47 Thursday, October 12, 1995 Variable Mean Std Dev Std Error T Prob>|T| ---------------------------------------------------------------------- COMPLY 209.75 24.16 6.04 34.73 0.0001 COMPLYD 9.75 24.16 6.04 1.61 0.1273 ----------------------------------------------------------------------Notice that the second line tests the hypothesis that the mean of COMPLY is actually 200. The two sided

**Two sample tests and confidence intervals**

The data for the question about Michelson's measurements of the speed
of light from Assignment 4 are the file *n: stat 330 michlson.dat* and I
use proc ttest to test for no change in mean.

options pagesize=60 linesize=80; data michlson; infile 'n:\stat\330\michlson.dat'; input set $ speed ; proc sort data=michlson; by set; proc ttest cochran; class set; var speed; run; proc univariate plot normal; by set; run;To get high definition graphs replace the proc univariate with

proc rank data=michlson normal=vw out=rmich; by set; var speed; ranks normscr; proc gplot data =rmich; title3 'normal probability plot'; by set; plot speed*normscr; run;The (low resolution) output is

The SAS System 1 14:31 Monday, October 16, 1995 TTEST PROCEDURE Variable: SPEED SET N Mean Std Dev Std Error Minimum Maximum ------------------------------------------------------------------------------- First 20 909.0000000 104.9260391 23.46217561 650.0000000 1070.000000 Second 20 831.5000000 54.2193401 12.12381302 740.0000000 950.000000 Variances T Method DF Prob>|T| -------------------------------------------------------- Unequal 2.9346 Satterthwaite 28.5 0.0065 Cochran 19.0 0.0085 Equal 2.9346 38.0 0.0056 For H0: Variances are equal, F' = 3.75 DF = (19,19) Prob>F' = 0.0060The line labelled "Equal" gives the usual two sample

The SAS System 1 10:11 Wednesday, October 25, 1995 ----------------------------- SET=First ----------------------------------- Univariate Procedure Variable=SPEED Moments N 20 Sum Wgts 20 Mean 909 Sum 18180 Std Dev 104.926 Variance 11009.47 Skewness -0.96461 Kurtosis 0.573188 USS 16734800 CSS 209180 CV 11.54302 Std Mean 23.46218 T:Mean=0 38.74321 Pr>|T| 0.0001 Num ^= 0 20 Num > 0 20 M(Sign) 10 Pr>=|M| 0.0001 Sgn Rank 105 Pr>=|S| 0.0001 W:Normal 0.920264 Pr<W 0.1059 Quantiles(Def=5) 100% Max 1070 99% 1070 75% Q3 980 95% 1035 50% Med 940 90% 1000 25% Q1 850 10% 750 0% Min 650 5% 695 1% 650 Range 420 Q3-Q1 130 Mode 980 Extremes Lowest Obs Highest Obs 650( 14) 980( 12) 740( 2) 1000( 11) 760( 15) 1000( 17) 810( 16) 1000( 18) 850( 6) 1070( 4) Stem Leaf # Boxplot 10 7 1 | 10 000 3 | 9 566888 6 +-----+ 9 033 3 *--+--* 8 558 3 +-----+ 8 1 1 | 7 6 1 | 7 4 1 | 6 5 1 0 ----+----+----+----+ Multiply Stem.Leaf by 10**+2 The SAS System 2 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=First ----------------------------------- Univariate Procedure Variable=SPEED Normal Probability Plot 1075+ +++++* | *+*++* | ** *++*+ | ** ++++ 875+ **+*+++ | +*+++ | ++++* | ++++ * 675+ +++++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 The SAS System 3 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=Second ---------------------------------- Univariate Procedure Variable=SPEED Moments N 20 Sum Wgts 20 Mean 831.5 Sum 16630 Std Dev 54.21934 Variance 2939.737 Skewness 0.692545 Kurtosis 0.328607 USS 13883700 CSS 55855 CV 6.520666 Std Mean 12.12381 T:Mean=0 68.58403 Pr>|T| 0.0001 Num ^= 0 20 Num > 0 20 M(Sign) 10 Pr>=|M| 0.0001 Sgn Rank 105 Pr>=|S| 0.0001 W:Normal 0.934107 Pr<W 0.1953 Quantiles(Def=5) 100% Max 950 99% 950 75% Q3 870 95% 945 50% Med 810 90% 915 25% Q1 805 10% 770 0% Min 740 5% 750 1% 740 Range 210 Q3-Q1 65 Mode 810 Extremes Lowest Obs Highest Obs 740( 14) 870( 12) 760( 5) 870( 20) 780( 3) 890( 1) 790( 7) 940( 16) 800( 18) 950( 17) Stem Leaf # Boxplot 9 5 1 | 9 4 1 | 8 57779 5 +-----+ 8 011111124 9 *--+--* 7 689 3 | 7 4 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**+2 The SAS System 4 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=Second ---------------------------------- Univariate Procedure Variable=SPEED Normal Probability Plot 975+ * ++++ | +*+++++++ | *+*++*+*+ | **++*+*++** | +*++*+*+++ 725+ +++++*+++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 The SAS System 5 10:11 Wednesday, October 25, 1995 Univariate Procedure Schematic Plots Variable=SPEED | 1100 + | | | | | 1050 + | | | | | | | 1000 + | | | | +-----+ | | | 950 + | | | | *-----* | | | | | | | + | | 900 + | | | | | | | | | | +-----+ | | | | | 850 + +-----+ | | | | | + | | | | | | | *-----* 800 + | +-----+ | | | | | | | | | 750 + | | | | | | | 700 + | | | 650 + 0 ------------+-----------+----------- SET First Second

You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.

**Two sample paired comparisons**

You do this with *proc means*:

options pagesize=60 linesize=80; data michpair; infile 'n:\stat\330\michpair.dat'; input speed1 speed2 ; diff=speed1-speed2 proc means mean std stderr t prt maxdec=2; proc univariate plot normal; var speed1 diff; run;The output is

The SAS System 2 14:31 Monday, October 16, 1995 Variable Mean Std Dev Std Error T Prob>|T| -------------------------------------------------------------------------- SPEED1 909.00 104.93 23.46 38.74 0.0001 SPEED2 831.50 54.22 12.12 68.58 0.0001 DIFF 77.50 109.78 24.55 3.16 0.0052 --------------------------------------------------------------------------

Only the third line actually matters.

- The file
*n: stat 330 glucose.dat*contains blood glucose levels for 52 women after their first pregnancy and then their second. The following SAS commands read the file and print out the data set.options pagesize=60 linesize=80; data glucose; infile 'n:\stat\330\glucose.dat'; input frstpreg scndpreg ; proc print; run;

- Get 95% confidence intervals for first pregnancy mean, second pregnancy mean and difference in means.
- Is there a difference in blood glucose levels between the two pregnancies?
- Does the population look reasonably normal?

- For the body fat data in the introductory handout on SAS
do men and women have different average percent body fat? Do they
have different population standard deviations? Are the normality
assumptions adequate? (The data are in
*n: stat 330 bodyfat.dat*.) - In the file
*n: stat 330 iris.dat*are the measurements of 4 dimensions on each of 50 flowers of 2 species of iris. Read them with`input species $ sepallen;`-- the file has 3 other columns which are ignored by this command. Do Versicolor and Virginica Irises have different average sepal lengths?

**DUE: Friday end of Week 9**

Fri Mar 6 15:06:27 PST 1998