Stat650/403/890

Factorial Treatment Structure

Reading: Read the section on two-way anova (Section 16.2, pages 179-185) in Julian Faraway's Practical Regression and ANOVA Using R. For supplemental reading see pages 1-35 Chapter 9 on two-factor experiments in Carl Schwarz' notes.

Some main points: If there are more than one treatment factors we may be interested not just in whether the treatment combinations have effects but in interpretation of the possible effects into main effects and interaction effects. We will consider the completely randomized design in which the units are assigned at random to the different treatment combination. In a full factorial design, each possible treatment combination is assigned to at least one unit.

Learning goals: Recognize factorial treatment structures when you see them. Know how to analyze a two-factor completely randomized experimental design. Become familiar with useful graphical and diagnostic methods for making sense of such experiments. Understand the assumptions in the model commonly assumed for the data in these experiments. Understand the overall ideas of multiple comparison methods for these experiments. Understand the motivation for transormations of the data, and know how to apply the log transformation.

Example: Poison and poison treatment

The data for the rat poison and treatment experiment can be found as the data set "poisson" in the R library "boot" or as "poisson.data" in the library "BHH2".   If you want to work on these data in parallel with Faraway, the following commands may be useful.

library(boot)
attach(poisons)

Look at the data with simple boxplots and then interaction plots:

boxplot(time~poison)
boxplot(time~treat)
interaction.plot(treat,poison,time)
interaction.plot(poison,treat,time)

Alternatively, make a plot of survival time by treatment with colours for the different poisons. "treat" is coded A, B, C, D, and as.numeric(treat) changes that to 1, 2, 3, 4.

plot(as.numeric(treat),time,bg=as.numeric(poison),pch=21,cex=2)

Fit a linear model with main and interaction effects, and name the resulting linear model output object as, say, "rats". Then look at the analysis of variance table.

rats <- lm(time ~ poison*treat)
anova(rats)

Examine the distribution of the residuals to see if it looks normally distributed, and then plot the residuals above the fitted values to see if the variance tends to be larger where the mean survival time is larger. The normal quantile-quantile plot (qqnorm) should give a straight line if the residuals are normal.

hist(rats$res)
qqnorm(rats$res)
plot(rats$fitted, rats$res, xlab="fitted",   ylab="residuals")

Since the variance indeed does not look constant, try a transformation, taking logarithms of the data.

rats <- lm(log(time) ~ poison*treat)
anova(rats)
hist(log(time))
qqnorm(rats$res)
plot(rats$fitted, rats$res, xlab="fitted", ylab="residuals",main="log response")

Here is one more transformation to try, using the reciprocal of survival time, which could be interpreted as death rate.

rats <- lm(1/time ~ poison*treat)
anova(rats)
summary(rats)
hist(1/time)
qqnorm(rats$res)
plot(rats$fitted, rats$res, xlab="fitted", ylab="residuals",main="1/response")

interaction.plot(treat,poison,1/time)
interaction.plot(poison,treat,1/time)

Here is the model assuming no interaction:

rats ~ lm(1/time ~ poison + treat)
anova(rats)
summary(rats)

Here are the commands again, this time with the output tables:

> rats <- lm(1/time ~ poison * treat)
> anova(rats)
Analysis of Variance Table

Response: 1/time
             Df Sum Sq Mean Sq F value    Pr(>F)
poison        2 34.877 17.439 72.6347 2.310e-13 ***
treat         3 20.414   6.805 28.3431 1.376e-09 ***
poison:treat 6 1.571   0.262 1.0904    0.3867
Residuals    36 8.643   0.240
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The interaction was not significant. The model fitted without interaction looks like this:

> rats <- lm(1/time ~ poison + treat)
> anova(rats)
Analysis of Variance Table

Response: 1/time
          Df Sum Sq Mean Sq F value    Pr(>F)
poison     2 34.877 17.439 71.708 2.865e-14 ***
treat      3 20.414   6.805 27.982 4.192e-10 ***
Residuals 42 10.214   0.243
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(rats)

Call:
lm(formula = 1/time ~ poison + treat)

Residuals:
     Min       1Q   Median       3Q      Max
-0.82757 -0.37619 0.02116 0.27568 1.18153

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.6977     0.1744 15.473 < 2e-16 ***
poison2       0.4686     0.1744   2.688 0.01026 *
poison3       1.9964     0.1744 11.451 1.69e-14 ***
treatB       -1.6574     0.2013 -8.233 2.66e-10 ***
treatC       -0.5721     0.2013 -2.842 0.00689 **
treatD       -1.3583     0.2013 -6.747 3.35e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4931 on 42 degrees of freedom
Multiple R-Squared: 0.8441,     Adjusted R-squared: 0.8255
F-statistic: 45.47 on 5 and 42 DF, p-value: 6.974e-16

For confidence intervals on the difference between treatment effects, the standard error for the difference is given in the summary table as .201. For an interval with .95 confidence, the upper .025 point in the t distribution with 36 degrees of freedom (the df for residuals from the anova table) is obtained with either of

> qt(.975,42)
[1] 2.018082
> qt(.025,42,lower.tail=F)
[1] 2.018082

Thus the half width of the confidence interval for the difference between any two treatment main effects is

> 2.018082 * .2013
[1] 0.4062399

The confidence interval will be the difference between the two treatment effect coefficients plus or minus .406. Any pair of treatment effect coefficient having a difference greater than .406 will have a confidence interval not including zero, so that the null hypothesis of no difference between those two treatment effects would be rejected at the .05 level.

The conservative Bonferonni multiple comparison procedure for comparing all six pairs of treatment effects would divide the alpha value by the number of comparisons giving:

> .05/6
[1] 0.008333333

Putting half of that in each tail produces a t value of

> qt(0.008333333/2, 42, lower.tail=F)
[1] 2.769016

and an interval half width of

> 2.769016 * .2013
[1] 0.5574029

Thus, any difference greater than .557 in magnitude would be significant from the multiple comparison point of view. Notice this does not change the conclusion on which of the poison treatments are significantly different.

Example: Vitamin C dose and type

In this experiment 60 guinea pigs were randomly assigned to treatment combinations with different doses and delivery methods of Vitamin C. The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). Thus there are six different treatment combinations, and 10 guinea pigs are assigned to each.

Look at the data set:

?ToothGrowth
ToothGrowth

Fit a model with main effects and interaction:

toothmodel <- lm(len~supp*dose,data=ToothGrowth)

alternatively:
attach(ToothGrowth)
toothmodel <- lm(len~supp*dose)

anova(toothmodel)

interaction.plot(dose,supp,len)
interaction.plot(supp,dose,len)