Stat 403/650

Example:  Two treatment levels, randomization test, t test, and confidence intervals associated with each


The plant growth data is a data set in R called "PlantGrowth".   We will use only the control and treatment 1, which we consider exposure to a potentially harmful substance.  The response is the dry weight of the plants.  The treatment  assignment is assumed to be completely randomized, with 10 units getting the control and ten units getting the exposure treatment.

In R,  view the data set with

PlantGrowth

Use the first 20 units (control and treatment 1) for the response and treatment variables:

y <- PlantGrowth$weight[1:20]
tr <- c(rep(1,10), rep(2,10))

Alternatively,

attach(PlantGrowth)

y <- weight[1:20]

tr <- group[1:20]

tr <- as.numeric(tr)

Print them out next to each other:

cbind(y,tr)

Plot the data, and look at the difference but also overlap in the response between the two treatments.

plot(tr,y)

Now that the experiment has been completed the data are arranged for convenience with the control units first and the exposure units second.  This corresponds to arbitrarily using new unit labels 1 through 10 for the control units and 11 through 20 for the exposure units.  Presumably, these were not the original unit numbers used to identify field plots when the treatments were randomly assigned. 

Think of the control units as a sample consisting of the first ten units in the data set:

s <- 1:10

The response values for the control units are

y[s]

The responses for the other treatment can be listed with

y[-s]

where R reads the "minus" sign as the components of y in the compliment of s.

The treatment mean for the control group is

mean(y[s])

and for the exposure group the treatment mean is

mean(y[-s])

The estimated treatment effect is the difference

mean(y[-s]) - mean(y[s])


Randomization test of no treatment effect

The difference between the sample means for the two groups serves as our test statistic.  We'll call our observed value of this statistics "diffobs":

diffobs <- mean(y[-s]) - mean(y[s])

The randomization test uses many reassignments of the units to treatment and control.  Under the null hypothesis of no treatment effect, the response of each unit should come out exactly the same no matter which treatment is assigned to it.  We look at the distribution of our test statistic under the null hypotheses.

Select one new treatment assignment

s <- sample(1:20, 10)

s

These units are considered "control" units in this assignment, and the other units are "exposure" units.   With this reassignment, the test statistic is

diff <- mean(y[-s]) - mean(y[s])

Now repeat this, say, 10 times and look at the ten different values of the test statistic with

diff <- numeric()
for (k in 1:10) {s <- sample(1:20,10); diff[k] <- mean(y[-s]) - mean(y[s])}

Now do this a serious number of times, say 4000:

for (k in 1:4000) {s <- sample(1:20,10); diff[k] <- mean(y[-s]) - mean(y[s])}

Look at the distribution of the test statistic diff:

hist(diff)

Add our observed value to the plot:

points(diffobs, 0)

For a two-sided test of the null hypothesis of no treatment effect the alternative hypothesis is that the treatment has some effect, either positive or negative.  We would reject the null hypothesis if the observed value of the test statistic was either very much larger or very much lower than would be expected if the null hypothesis is true.  For a p-value we look at the probability under the null hypothesis of a value as extreme or more so that the one we observed.  For a two-sided test this probability is doubled to reflect the procedure that we would reject the test at level alpha (.05, say) if we got a value of the test statistic in either the lower alpha/2 (or.025 in the case alpha=.05) or upper alpha/2 (.025) tail of the distribition under the null hypothesis. 

In R the command "length(diff)" gives the length of the vector diff, which is 4000.  "diff[diff<=diffobs]" selects the components of diff which are less than or equal to our observed difference.  The number of these is given by "length(diff[diff<=diffobs])". 

A one-sided p-value is given by the proportion of reassignments that have a value less than or equal to our observed value:

length(diff[diff<=diffobs])/length(diff)

For a two-sided p-value we double this value. 

A small p-value is considered evidence against the null hypothesis.  Here, the p-value of about .25 is not small.  It says that, even if there is no effect of the treatments, with our sample sizes about one out of four experiments would find  an observed difference  as  large or larger than ours.

 

Test a different hypothesis


# make a function to calculate the p-value:
pvalue <- function(){length(diff[abs(diff)>=abs(diffobs)])/length(diff)}
# call that function
pvalue()

Instead of testing the null hypothesis that the treatment effect is zero, consider testing that the exposure causes an average increase in 10 kilograms to the weight of plants.

s <- 1:10
effect0 <- 10 # try a new null hypothesis, that the treatment effect is exactly 10 kg
y0 <- c(y[s],y[-s]-effect0) 

This is the response each unit would have if all units received the control treatment, under the null hypothesis of treatment effect = 10.  That is, units that actually did get the exposure would have produced 10 kilograms less without it.


y0  # prints the value each unit would give if it gets the control treatment

y0 + effect0  # prints the response of each unit if it gets the exposure, under the new null hypothesis.

# put this in a loop and compute the sampling distribution of the test statistic
# under the new null hypothesis that the treatment effect is 10:
for(k in 1:4000){s<-sample(20,10);diff[k]<-mean(y0[-s]+effect0)-mean(y0[s])}
hist(diff)
points(diffobs, 0, cex=4, pch=21, bg="blue")
abline(v=diffobs, col="blue")
points(effect0, 0, cex=4, pch=24)

pvalue()


Confidence interval based on the randomization test


The function below puts the above commands together.

###################################################################
## randomization test and confidence interval
## effect0 is the hypothesized treatment effect; its default is 0.
## b is the number of simulation runs; its default is 4000)
## xmin and xmax are the limits for the histogram horizontal axis
###################################################################

randtest <- function(tr, y, effect0 = 0, b=4000, xmin=-2, xmax=2, alpha=.05)
  {
    N <- length(y)
    s1 <- (1:N)[tr == 1]
    n1 <- length(s1)
    diffobs <- mean(y[-s1])-mean(y[s1])
    y0 <- y
    y0[s1] <- y[s1]
    y0[-s1] <- y[-s1] - effect0
   
    diff <- numeric(b)
    for(k in 1:b)
      {
        s1 <- sample(N,n1)
        diff[k] <- mean(y0[-s1]+effect0)-mean(y0[s1])
      }

    hist(diff, xlim=c(xmin,xmax), col="grey")
    points(diffobs,0,cex=4,pch=21,bg="blue")
    abline(v=diffobs, lwd =2, col="blue")
    pvalueup <- length(diff[diff >= diffobs])/length(diff)
    pvaluedown <- length(diff[diff <= diffobs])/length(diff)
    pvalue <- min(pvalueup, pvaluedown)
    points(effect0,0,cex=4,pch=24,bg="white")
    if(pvalue <= alpha/2) points(effect0,0,cex=4,pch=24,bg="red")
    print(c(effect0,pvalue))
  }


##########################################################

Copy the above function between the two lines of pound signs into your R session.  Type "randtest" and you see the details of the function.  Type "randtest(x,tr)" and the function runs with it's default values , so the  null hypothesis of no treatment effect is  tested.

The following commands call the function repeatedly, for a sequence of hypothesized effects

##########################################################

steps <- 201
effects <- seq(from=-1.5,to=.5,length.out=steps)
for (i in 1:steps) randtest(tr,y,effect0=effects[i],b=4000, xmin=-2, xmax=2,alpha=.10)

##########################################################

From the resulting table, collecting all the values of treatment effect for which the p-value is greater than .05 (that is, all the values we do not reject at the .05 level), we get a 90 percent confidence interval of about

(-0.91, 0.18)

The lower confidence limit is somewhere between -.92 and -.91 kiligrams, where the lower tail value goes from below .05 to greater than .05.  The upper limit is between .17 and .18 where the tail value changes back from greater than .05 to less than .05.  We could get a more accurate limit by checking more increments between .17 and .18. 

The confidence level is .90 because we reject values that are in either .05 tail.  The 90 percent confidence was specified with "alpha = .10" in calling the randtest function. 

To run this with a different data set, choose the number of steps you want, the "from" and "to" limits in the sequence of effects to test, and the "xmin" and "xmax" limits for the plot.

 


The 90 percent confidence interval for treatment effect consists of all the hypothesized values of effect that we don't reject.  For each test, we reject if the one-tailed p-value in the most extreme direction is less than  .05.  Since we would also reject for any observed value in the .05 region of the other tail, we have .10 probability of rejecting any hypothesized effect value, if it was true.  This gives the 1-.10=.90 confidence level. 

Here is an R function for the confidence interval based on the randomization test:

randci <- function(tr, y, confidence=.95, b=4000, xmin=-2, xmax=2, steps=201)
  {
     effects <- seq(from=-1.5,to=.5,length.out=steps)
     for (i in 1:steps) randtest(tr,y,effect0=effects[i],b, xmin, xmax,alpha=1-confidence)
  }

Copy and paste this function into your R window.   To run it with the defaults type

randci(tr,y)




T-test and confidence interval

Compare the two-sample t-test and associated confidence interval for the plant growth experiment: 



> t.test(y~tr, conf.level=.90)

        Welch Two Sample t-test

data:  y by tr
t = 1.1913, df = 16.524, p-value = 0.2504
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 -0.1716743  0.9136743
sample estimates:
mean in group 1 mean in group 2
          5.032           4.661