Stat650/403


Completely Randomized Design

In this section we consider an experiment with two treatment levels.  The N units in the study are randomly assigned to the treatments so that n1 of the units are assigned treatment 1 and the other n2 units are assigned treatment 2. 

Some basic ideas:

  • The assignment for one of the groups, such as treatment 1, is a simple random sample of n1 units from the N study units.  The remaining n2 units are assigned treatment 2.
  • Thus, every possible collection of n1 units has equal probability of being the group assigned to treatment 1.
  • The randomization in the assignment overcomes biases and effects of unobserved variables. 
  • Replication of the same treatment to more than one unit results in increased precision for estimating treatment effects and enables the calculation of hypothesis tests and confidence intervals.
  • It is important that the experiment be set up so that the response of a unit to a treatment depends only on the treatment assigned to that unit, not to the combination of treatments assigned to other units.  This is an assumption that underlies almost every experiment.
  • The null hypothesis of no treatment effect determines a corresponding sampling distribution of a test statistic.
  • A hypothesis testing procedure should have the property that, if the null hypothesis is true, we should have a low probability of rejecting it.  This probability is called the significance level
  • The probability under the null hypothesis of obtaining a value of the test statistic as high or higher than the one we observe is called the p-value.
  • A confidence interval for treatment effect can be constructed from a hypothesis test procedure by  testing every possible size of treatment effect.  Those values that are not rejected are put in the confidence interval, while those that are rejected are excluded. 

Learning goals:

  1. Be able to assign treatments in a completely randomized design with two levels of one factor. 
  2. Understand the concept of the reference sampling distribution that would be true if the null hypothesis is true. 
  3. Be able to compute (with software, not by hand) the p-value for the randomization test of no treatment effect.
  4. Be able to compute (with software, not by hand) the p-value for the randomization test of any specified treatment effect.
  5. Understand how a statistical test can be used to construct a confidence interval with a specified level of confidence.

Randomization Test


There is a good description and discussion of randomization or permutation tests in Chapter 18 of Moore et. al. (2003).  The permutation/randomization test part starts with section 18.6 on about the 50th page of the chapter.

Reference:

D.S. Moore, G. P. McCabe, W.M. Duckworth, S.L. Sclove (2003).  The practice of business statistics: using data for decisions.  W. H. Freeman.

Example: Tomato experiment; Testing different hypotheses about effect of treatment with the randomization test


tomato <- read.table(file=url("http://www.stat.sfu.ca/~thompson/stat403-650/data/tomato.txt"),header=T)
# the above is all one line and reads in the tomato data straight from the web site
# and puts in in a data frame called "tomato"
tomato # this prints out the data
tr <- tomato$tr # give a simpler name to the treatment variable within the data frame
y <- tomato$y # alternatively, one could simply type "attach(tomato)"
N <- 8 # the number of units in the experiment
n1 <- 4 # the number of units in the control group
s1 <- 1:n1 # the units in the control group, corresponding to row number in the data
diffobs <- mean(y[-s1])-mean(y[s1]) # the observed difference in the treatment sample means
diff <- numeric(0) # tell R that diff is a vector, or colunm of numbers
# now make a loop to made 4000 random assignments of the units to control and fertilizer
# and compute the difference in sample means for the two groups, as if treatment had no effect
for(k in 1:4000){s1 <- sample(N,n1);diff[k] <- mean(y[-s1])-mean(y[s1])}
hist(diff) # make a histrogram of the sampling distribution the test statistic "diff" has
                # when there is no treatment effect
points(diffobs, 0, cex=4, pch=21, bg="blue")  # draw the observed value of the test statistic
abline(v=diffobs, col="blue")  # draw a line at that value
# make a function to calculate the p-value:
pvalue <- function(){length(diff[diff>=diffobs])/length(diff)}
# call that function
pvalue()

effect0 <- 10 # try a new null hypothesis, that the treatment effect is exactly 10 kg
y0 <- c(y[s1],y[-s1]-effect0)  # this is the response each unit would have if all units received the
                                               # control treatment, under the null hypothesis of treatment effect = 10
y0 # prints the value each unit would give if it gets the control treatment
y0 + effect0 # prints the response of each unit if it gets the fertilizer, under H0.

# put this in a loop and compute the sampling distribution of the test statistic
# under the new null hypothesis that the treatment effect is 10:
for(k in 1:4000){s1 <- sample(N,n1);diff[k] <- mean(y0[-s1]+effect0)-mean(y0[s1])}
hist(diff)
points(diffobs, 0, cex=4, pch=21, bg="blue")
abline(v=diffobs, col="blue")
points(effect0, 0, cex=4, pch=24)

pvalue()

The p-value was about  0.045 and the histogram of the reference distribution shows its nonnormal irregularities:

histogram of randomization distribution

Confidence Interval from Randomization Test

Example: Tomato experiment; confidence intervals based on the randomization test

The function below puts the above commands together.

###################################################################
## randomization test and confidence interval
## effect0 is the hypothesized treatment effect; its default is 0.
## b is the number of simulation runs; its default is 4000)
## xmin and xmax are the limits for the histogram horizontal axis
###################################################################

randtest <- function(tr, y, effect0 = 0, b=4000, xmin=-30, xmax=60)
  {
    N <- length(y)
    s1 <- (1:N)[tr == 1]
    n1 <- length(s1)
    diffobs <- mean(y[-s1])-mean(y[s1])
    y0 <- y
    y0[s1] <- y[s1]
    y0[-s1] <- y[-s1] - effect0
   
    diff <- numeric(b)
    for(k in 1:b)
      {
        s1 <- sample(N,n1)
        diff[k] <- mean(y0[-s1]+effect0)-mean(y0[s1])
      }

    hist(diff, xlim=c(xmin,xmax), col="grey")
    points(diffobs,0,cex=4,pch=21,bg="blue")
    abline(v=diffobs, lwd =2, col="blue")
    pvalueup <- length(diff[diff >= diffobs])/length(diff)
    pvaluedown <- length(diff[diff <= diffobs])/length(diff)
    pvalue <- min(pvalueup, pvaluedown)
    points(effect0,0,cex=4,pch=24,bg="white")
    if(pvalue <= .05) points(effect0,0,cex=4,pch=24,bg="red")
    print(c(effect0, pvalue))
  }

#########################################################

The following commands call the function repeatedly, for a sequence of hypothesized effects

effects <- -10:40  # makes the variable "effects" equal the the sequence -10, -9, ..., 39, 40. 
effects # prints them out
steps <- length(effects)  # there should be 51 numbers in the sequence
for (i in 1:steps) randtest(tr,y,effect0=effects[i]) # calls the function randtest for each effects value

# from the resulting printout we can see that the lower limit of the confidence interval
# is between 0 and 1
# we can refine the calculation by using a finer sequence between 0 and 1

effects <- seq(0,1,.05)
steps <- length(effects)
for (i in 1:steps) randtest(tr,y,effect0=effects[i])

# and for the upper limit, a finer sequence (in steps of .05) between 29 and 30

effects <- seq(29,30,.05)
steps <- length(effects)
for (i in 1:steps) randtest(tr,y,effect0=effects[i])

The 90 percent confidence interval for treatment effect consists of all the hypothesized values of effect that we don't reject.  For each test, we reject if the one-tailed p-value in the most extreme direction is less than  .05.  Since we would also reject for any observed value in the .05 region of the other tail, we have .10 probability of rejecting any hypothesized effect value, if it was true.  This gives the 1-.10=.90 confidence level. 


Randomization test and confidence interval with plant growth experiment data