Stat 403/650
Example: Two treatment levels, randomization test, t
test, and confidence intervals associated with each
The plant growth data is a data set in R called
"PlantGrowth". We will use only the control and treatment
1, which we consider exposure to a potentially harmful substance.
The response is the dry weight of the plants. The treatment
assignment is assumed to be completely randomized, with 10 units
getting the control and ten units getting the exposure treatment.
In R, view the data set with
PlantGrowth
Use the first 20 units (control and treatment 1) for the response and
treatment variables:
y <- PlantGrowth$weight[1:20]
tr <- c(rep(1,10), rep(2,10))
Alternatively,
attach(PlantGrowth)
y <- weight[1:20]
tr <- group[1:20]
tr <- as.numeric(tr)
Print them out next to each other:
cbind(y,tr)
Plot the data, and look at the difference but also overlap in the
response between the two treatments.
plot(tr,y)
Now that the experiment has been completed the data are arranged for
convenience with the control units first and the exposure units
second. This corresponds to arbitrarily using new unit labels 1
through 10 for the control units and 11 through 20 for the exposure
units. Presumably, these were not the original unit numbers used
to identify field plots when the treatments were randomly
assigned.
Think of the control units as a sample consisting of the first ten
units in the data set:
s <- 1:10
The response values for the control units are
y[s]
The responses for the other treatment can be listed with
y[-s]
where R reads the "minus" sign as the components of y in the compliment
of s.
The treatment mean for the control group is
mean(y[s])
and for the exposure group the treatment mean is
mean(y[-s])
The estimated treatment effect is the difference
mean(y[-s]) - mean(y[s])
Randomization test of no treatment effect
The difference between the sample means for the two groups serves as
our test statistic. We'll call our observed value of this
statistics "diffobs":
diffobs <- mean(y[-s]) - mean(y[s])
The randomization test uses many reassignments of the units to
treatment and control. Under the null hypothesis of no treatment
effect, the response of each unit should come out exactly the same no
matter which treatment is assigned to it. We look at the
distribution of our test statistic under the null hypotheses.
Select one new treatment assignment
s <- sample(1:20, 10)
s
These units are considered "control" units in this assignment, and the
other units are "exposure" units. With this reassignment,
the test statistic is
diff <- mean(y[-s]) - mean(y[s])
Now repeat this, say, 10 times and look at the ten different values of
the test statistic with
diff <- numeric()
for (k in 1:10) {s <- sample(1:20,10); diff[k] <- mean(y[-s]) -
mean(y[s])}
Now do this a serious number of times, say 4000:
for (k in 1:4000) {s <- sample(1:20,10); diff[k] <- mean(y[-s]) -
mean(y[s])}
Look at the distribution of the test statistic diff:
hist(diff)
Add our observed value to the plot:
points(diffobs, 0)
For a two-sided test of the null hypothesis of no treatment
effect the alternative hypothesis is that the treatment has some
effect, either positive or negative. We would reject the null
hypothesis if the observed value of the test statistic was either very
much larger or very much lower than would be expected if the null
hypothesis is true. For a p-value we look at the
probability under the null hypothesis of a value as extreme or more so
that the one we observed. For a two-sided test this probability
is doubled to reflect the procedure that we would reject the test at
level alpha (.05, say) if we got a value of the test statistic in
either the lower alpha/2 (or.025 in the case alpha=.05) or upper
alpha/2 (.025) tail of the distribition under the null
hypothesis.
In R the command "length(diff)" gives the length of the vector diff,
which is 4000. "diff[diff<=diffobs]" selects the components of
diff which are less than or equal to our observed difference. The
number of these is given by "length(diff[diff<=diffobs])".
A one-sided p-value is given by the proportion of reassignments that
have a value less than or equal to our observed value:
length(diff[diff<=diffobs])/length(diff)
For a two-sided p-value we double this value.
A small p-value is considered evidence against the null
hypothesis. Here, the p-value of about .25 is not small. It
says that, even if there is no effect of the treatments, with our
sample sizes about one out of four experiments would find an
observed difference as large or larger than ours.
Test a different hypothesis
# make a function to calculate the p-value:
pvalue <-
function(){length(diff[abs(diff)>=abs(diffobs)])/length(diff)}
# call that function
pvalue()
Instead of testing the null hypothesis that the treatment effect is
zero, consider testing that the exposure causes an average increase in
10 kilograms to the weight of plants.
s <- 1:10
effect0 <- 10 # try a new null hypothesis, that the treatment effect
is exactly 10 kg
y0 <- c(y[s],y[-s]-effect0)
This is the response each unit
would have if all units received the control treatment, under the null
hypothesis of treatment effect = 10. That is, units that actually
did get the exposure would have produced 10 kilograms less without it.
y0 # prints the value each unit would give if it gets the control
treatment
y0 + effect0 # prints the response of each unit if it gets the
exposure, under the new null hypothesis.
# put this in a loop and compute the sampling distribution of the test
statistic
# under the new null hypothesis that the treatment effect is 10:
for(k in
1:4000){s<-sample(20,10);diff[k]<-mean(y0[-s]+effect0)-mean(y0[s])}
hist(diff)
points(diffobs, 0, cex=4, pch=21, bg="blue")
abline(v=diffobs, col="blue")
points(effect0, 0, cex=4, pch=24)
pvalue()
Confidence interval based on the randomization test
The function below puts the above commands together.
###################################################################
## randomization test and confidence interval
## effect0 is the hypothesized treatment effect; its default is 0.
## b is the number of simulation runs; its default is 4000)
## xmin and xmax are the limits for the histogram horizontal axis
###################################################################
randtest <- function(tr, y, effect0 = 0, b=4000, xmin=-2,
xmax=2, alpha=.05)
{
N <- length(y)
s1 <- (1:N)[tr == 1]
n1 <- length(s1)
diffobs <- mean(y[-s1])-mean(y[s1])
y0 <- y
y0[s1] <- y[s1]
y0[-s1] <- y[-s1] - effect0
diff <- numeric(b)
for(k in 1:b)
{
s1 <- sample(N,n1)
diff[k] <-
mean(y0[-s1]+effect0)-mean(y0[s1])
}
hist(diff, xlim=c(xmin,xmax), col="grey")
points(diffobs,0,cex=4,pch=21,bg="blue")
abline(v=diffobs, lwd =2, col="blue")
pvalueup <- length(diff[diff >=
diffobs])/length(diff)
pvaluedown <- length(diff[diff <=
diffobs])/length(diff)
pvalue <- min(pvalueup, pvaluedown)
points(effect0,0,cex=4,pch=24,bg="white")
if(pvalue <= alpha/2)
points(effect0,0,cex=4,pch=24,bg="red")
print(c(effect0,pvalue))
}
##########################################################
Copy the above function between the two lines of pound signs into your
R session. Type "randtest" and you see the details of the
function. Type "randtest(x,tr)" and the function runs with it's
default values , so the null hypothesis of no treatment effect
is tested.
The following commands call the function repeatedly, for a sequence of
hypothesized effects
##########################################################
steps <- 201
effects <- seq(from=-1.5,to=.5,length.out=steps)
for (i in 1:steps) randtest(tr,y,effect0=effects[i],b=4000, xmin=-2,
xmax=2,alpha=.10)
##########################################################
From the resulting table, collecting all the values of treatment effect
for which the p-value is greater than .05 (that is, all the values we
do not reject at the .05 level), we get a 90 percent confidence
interval of about
(-0.91, 0.18)
The lower confidence limit is somewhere between -.92 and -.91
kiligrams, where the lower tail value goes from below .05 to greater
than .05. The upper limit is between .17 and .18 where the tail
value changes back from greater than .05 to less than .05. We
could get a more accurate limit by checking more increments between .17
and .18.
The confidence level is .90 because we reject values that are in either
.05 tail. The 90 percent confidence was specified with "alpha =
.10" in calling the randtest function.
To run this with a different data set, choose the number of steps you
want, the "from" and "to" limits in the sequence of effects to test,
and the "xmin" and "xmax" limits for the plot.
The 90 percent confidence interval for treatment effect
consists of all the hypothesized values of effect that we don't
reject. For each test, we reject if the one-tailed p-value
in the most extreme direction is less than .05. Since we
would also reject for any observed value in the .05 region of the other
tail, we have .10 probability of rejecting any hypothesized effect
value, if it was true. This gives the 1-.10=.90 confidence
level.
Here is an R function for the confidence interval based on the
randomization test:
randci <- function(tr, y, confidence=.95, b=4000, xmin=-2,
xmax=2, steps=201)
{
effects <-
seq(from=-1.5,to=.5,length.out=steps)
for (i in 1:steps)
randtest(tr,y,effect0=effects[i],b, xmin,
xmax,alpha=1-confidence)
}
Copy and paste this function into your R window. To run it
with the defaults type
randci(tr,y)
T-test and confidence interval
Compare the two-sample t-test and associated confidence interval for
the plant growth experiment:
> t.test(y~tr, conf.level=.90)
Welch Two Sample t-test
data: y by tr
t = 1.1913, df = 16.524, p-value = 0.2504
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
-0.1716743 0.9136743
sample estimates:
mean in group 1 mean in group 2
5.032 4.661
|