#### Stat 430 Week 4 Code ### 1. Power calculations ### 2. Normality testing We're used to doing is testing null hypotheses through p-values and alpha. P-value: The chance that a random sample would produce as much evidence against the null as the observed sample did, if the null were true. We compare this to alpha, the acceptable chance of falsely rejecting the null hypothesis for a test. By default, we assume the null and see how likely such a sample is under the null. We could also assume an alternative and see how likely it would be to reject the null under that alternative. If the alternative is close to the null, there would be a small chance of (correctly) rejecting the null. (Draw pic, two normal curves) If the alternative is far from the null, it's highly likely that the null would be rejected, assuming this alternative. (similar pic) This chance of correctly rejecting the null is called the POWER, and it is calucluated as 1 - beta, where beta is the chance of a Type II error (failing to reject the null when the null is false) Finding or using power isn't as simple as finding the p-value. For something as simple as powet.t.test(), we need four of the following five... n (The sample size in EACH group. Must be equal or use the smallest one) delta (The difference between the two means) power sd (Standard deviation) sig.level (Alpha. The chance of rejecting the null when the null is true)' ### First, look at the help file ?power.t.test ### If gives n=NULL, delta=NULL, sd=1, sig.level=0.05, power=NULL ### That means that if we don't put anything in for sd, it will ### default to 1. If we don't put anything in for sig.level, it will ### default to 0.05. This could be handy, and save us time. ## Next, let's use the examples: ## This is uses as many defaults as possible ## sd = 1, alpha = 0.05, two-sample, two-tailed power.t.test(n = 20, delta = 1) ## power = 0.9886. So you would have a 98.96% chance of rejecting ## the null if the true difference was 1 sd, and each group has 20 units. ## Manipulating this... power.t.test(n = 2, delta = 1) ## We get almost no power, (0.09 ish) when sample size is very small. ## Also... power.t.test(n = 20, delta = 0.5) power.t.test(n = 20, delta = 0.2) power.t.test(n = 20, delta = 0.01) power.t.test(n = 20, delta = 0) ## power can never get smaller than alpha/2 for two-tailed. ## For one-tailed alternatives, power = alpha at delta = 0. power.t.test(n = 20, delta = 0, alternative="one.tailed") ## The alternative is defined a positive difference, so positive delta ## increases the power power.t.test(n = 20, delta = 0.5, alternative="one.tailed") # ...and negative delta decreases it power.t.test(n = 20, delta = -0.5, alternative="one.tailed") ### We can also use it the other way. If we have a specific power level ### and signficance level that we desire, and an estimate of the ### the standard deviation and effect size we expect, we can find the sample size ### we would need to achieve this level of certainty. power.t.test(power = .90, sig.level= 0.05, delta = 1, sd = 1) ### n = 22.02111. A little more 22 units per group will give you ### exactly 0.9 power. ### !!! Determining sample size is a very large part of the design ### portion of design of experiments. Researchers, espcially in biology, ### medicine and engineering, need to know in advance what sample size they're ### going to need to be able to detect a certain effect. ### It takes fewer sample units to detect a larger effect power.t.test(power = .90, sig.level = 0.05, delta = 2, sd = 1) ### It also takes fewer to detect an effect when there is less noise ### Note that n is exactly the same as the above example power.t.test(power = .90, sig.level = 0.05, delta = 1, sd = 1/2) ### If the sig.level is decreases, that makes it harder for the null ### to rejected in general, not just when it's false. power.t.test(power = .90, sig.level = 0.05, delta = 1, sd = 1) power.t.test(power = .90, sig.level = 0.01, delta = 1, sd = 1) ## And the same for when a higher power is needed power.t.test(power = .95, sig.level = 0.01, delta = 1, sd = 1) ### Using exercise 2.34 data (8th ed hardcover) ### This is paired data. karl = c(1.186,1.151,1.322,1.339,1.200,1.402,1.365,1.537,1.559) leh = c(1.061,0.992,1.063,1.062,1.065,1.178,1.037,1.086,1.052) ## If the real difference in the means were the same as the difference ## between the means of the these samples. What would be the power of the test ## alpha = 0.05 ? (Two-sided test) ## Step 1: Get delta, n, and sd D = mean(karl) - mean(leh) n.paired = length(karl) std.dev = sd( karl - leh) ## For unpaired data, use this to get the standard deviation ## Since there's only one sd, it's assumed that the variance is pooled. ## For the pooled variance formula. see P.39 under "The Two-Sample t-Test" ## For the generalization to many groups, see P.72 under "Analysis of a fixed effects model" # pooled.var = ( (n1 - 1)*s1^2 + (n2 - 1)*s2^2) / ( n1 + n2 - 2) # Sp = sqrt(pooled.var) ### Step 2: Get the power ### In the question, this is paired data because it's two types of measurement ### on the same gird power.t.test( n=n.paired, delta=D, sd=std.dev, sig.level=0.05, type="paired") ## It also has options like type=c("two.sample", "one.sample" ) ## That means we can change the interpretation to a 1,2, or paired ## sample t-test. ########################################### ### Normality tests ### We're going to look at two normality tests; one graphical and one formal. ### Each one has a weakness; ### Formal tests only tests whether a significant deviation from the normal # distribution has been detected. They test the null hypothesis that the # distribution from which a set of values came is normal. # So a small dataset with moderate breaks from normality could show up OK, # and a large dataset with small breaks from normality could as non-normal. ### Graphical methods of testing normality leave that decision to the viewer, ### which solves formal structure problems but is almost subjective. ### graphical method: qqnorm ### A Q-Q plot, or quantile-quantile plot, arranges the values of a dataset ### in increasing order. On the x-axis is the number of standard deviations ### above or below the mean value a value of this quantile would be if the distribution ### were normal. ## Breaks from a straight line mean breaks from normality, ## but small breaks at the end are to be expected. from.norm = rnorm(100, mean=10, sd = 3) par(mfrow = c(1,2)) hist(from.norm) qqnorm(from.norm) qqline(from.norm) ## In this example, the 100 values are at the 0.5%, 1.5%, 2.5%, ... , 99.5% quantiles. ## So the 3rd value, at the 2.5% quantile, should be 1.96 standard deviations below ## the mean, assuming normality. ## Its x-position is that exactly. ## The y-position is its actual value, which is about 6 (From normal, 2 std below 10) ## If the distribution is positivitely skewed ## The qq line bends downward from.exp = rexp(100) hist(from.exp) qqnorm(from.exp) qqline(from.exp) ### If the distribution is negatively skewed ## The qq line bends upward from.beta = rbeta(100, shape1=8, shape2=2) hist(from.beta) qqnorm(from.beta) qqline(from.beta) ### If there is more or less kurtosis (peakness) than normal distribution ### An s-shape appears ### The qqplot is ALWAYS increasing because it's using the sorted values. ### qqnorm is a normality check, not a symmetry check, it can find breaks from ### normality in symmetric but non-normal distirbutions. ###### Shapiro-Wilks test ## There's not much to this test. It only takes in one thing: a list of numbers shapiro.test(from.norm) ## And gives out two things: #Shapiro and Wilk's W statistic: Perfectly normal is 1, # and the farther from normal something gets, the lower this is, down to a value of 0. # The p-value of the test. shapiro.test(from.exp) shapiro.test(from.beta) ### It's easier to detect small differences in large sample sizes shapiro.test( rt(n=10,df=20)) # 10 values from a t-distribution with 20 df shapiro.test( rt(n=30,df=20)) # 30 values shapiro.test( rt(n=100,df=20)) # 100 values shapiro.test( rt(n=1000,df=20)) # 1000 values shapiro.test( rt(n=5000,df=20)) # 1000 values ### The W-statistic remains quite high, showing that t with 20 df is quite close ### to a normal, but eventually the p-value gets very small anyway.