Chapter 15. Statistical significance

1. You are doing some research and have collected some data from a sample of people who were randomly selected from a population in which you are interested. You have spent many nights pondering your data and, in a flash of insight, you have identified what seems to be an important pattern in the numbers.

What would your null hypothesis be? (You can't answer this question in any detail with reference to means, frequencies, or percentages, but you should be able to answer it in a way that incorporates what you know about the role played by null hypotheses.)

The only part of the null hypothesis ( HØ) that you can state without more information is the part you wouldn't actually write. It is: "The apparent pattern you see in the data is nothing more than a product of sampling variability." Note that this is not the actual null hypothesis; it is the logic underlying the null hypothesis. To see some actual hypotheses, click on some of these links: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11

2. What would your alternate hypothesis be?

The only part of the alternate hypothesis ( HALT) that you can state without more information is the part you wouldn't actually write. It is: "The pattern you see in the data is not due to sampling variability." Note that this is not the actual null hypothesis; it is the logic underlying the null hypothesis. To see some actual hypotheses, click on some of these links: #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11

If the null hypothesis is false, the pattern you see in the data is there because your sample comes from a population in which there is a similar relationship between the same variables; the patterb is not a result of sampling variability.

3. Because your insight has given you a burst of energy, you decide to forge ahead and test your hypotheses. Which one do you test? Why that one and not the other one?

You always test the null hypothesis. The basic question you are asking here is whether the pattern you see in the sample is a reflection of a similar pattern in the population or simply a product of sampling variability. It is your job as researcher to determine whether the pattern you see could be due to sampling variability. If you conclude that sampling variability could not be the cause of the pattern you see, you reject HØ and declare that the pattern is "statistically significant." If, on the other hand, you conclude that the pattern you see in your data could be due to sampling variability, you "fail to reject" the null hypothesis, and you declare that sampling variability is a reasonable cause of the apparent pattern.

4. The test will involve a critical value. What are critical values? What role do they play in the testing of hypotheses? What do you do with them? Where do you get them?

There are many ways to test a null hypothesis. In every case, you compare a measure of the difference in your data to a sampling distribution. The question you ask is always "what is the probability of getting a difference this large when the null hypothesis is true?" To answer this question, you use special tables that were constructed to describe sampling distributions in which the null hypothesis is true. For each of these tables, you can quickly determine how large a difference you would expect to see in only one percent of samples, in five percent of samples, in ten percent of samples, etc. This makes it easy to tell the likelihood of a difference as large as the one you have.

The critical value is like the bar in a high jump where you have to go over the bar in order to succeed. If the difference you see in your data is bigger than the critical value, you reject the hull hypothesis; if the difference is smaller than the critical value, you don't reject the null hypothesis.

The critical value usually depends on three things: 1) how many variables you have; 2) what kind of variables you have -- level of scaling and continuous or discrete; and 3) how certain you want to be that you don't make a Type I error. The first two factors will determine which kind of test you will do. It may be a chi-squared test, a z-test of a single mean, a z-test for the difference between means, a t-test of a single mean, a t-test for the difference between means, etc. Which test you do will determine which table you will use. It may be the table of areas under the normal curve (for z-tests), a table of critical values of chi squared, a table of critical values of t (for t-tests), a table of critical values of F (for F-tests), etc.

You will probably want to be 95% or 99% confident that you won't make a Type I error.

z-tests and t-tests come in two versions. A two-tailed test only asks whether the variables you are looking at are related to one another. A one-tailed test is also concerned with the direction of the difference in your data.

Chi-squared tests, t-tests, and F-tests also require you to calculate "degrees of freedom." These depend on the size of your samples or the number of values your variables take.

Once you have decided which kind of test to do, your level of confidence, your degrees of freedom, and whether it is a one-tailed or a two-tailed test, you go to the table and determine the critical value.

Then you do some calculations with your data that allow you to compare it to the critical value.

If the difference in your data is smaller than the critical value, it means that the probability of getting a difference this large when HØ is true is so large that sampling variability is a reasonable explanation for what you see, so you fail to reject the null hypothesis.

If the difference is larger than the critical value, it means that the probability of getting a difference this large when the null hypothesis is true is so small that sampling variability is not a reasonable explanation for what you see, so you reject the null hypothesis.

5. You have to do something with your data before the critical value comes into play, then you compare the result to the critical value. What do you look for when you do the comparison? How do you decide what the outcome of your test of the hypothesis is?

You have to convert what you see in your data to a number that tells how big the pattern or difference you see is. You do this by calculating the a value of chi squared for a crosstabulation table, a z or a t for a mean or pair of means, or an F for analysis of variance.

Then you compare your chi squared, z, t, or F to the critical value in the table to see which is larger.

If the value of chi squared, z, t, or F for your data is smaller than the critical value, it means that the probability of getting a difference this large when the null hypothesis is true is so large that sampling variability is a reasonable explanation for what you see, so you fail to reject the null hypothesis.

If the value of chi squared, z, t, or F for your data is larger than the critical value, it means that the probability of getting a difference this large when the null hypothesis is true is so small that sampling variability is not a reasonable explanation for what you see, so you reject the null hypothesis.

6. What is Type I error?

You commit a Type I error if you reject a true null hypothesis -- that is, if you conclude that the difference in your data indicates a real difference in the population, when in fact there is none (i.e., when the difference in your sample data was due to sampling variability). In simpler words, you make a Type I error when you incorrectly say "there is a difference." This is like convicting an innocent person for a crime he didn't commit.

7. What is Type II error?

You commit a Type II error if you fail to reject a false null hypothesis -- that is, if you conclude that the difference in your data is due to sampling variability when, in fact, the reason you saw the difference in your data is because your sample came from a population in which the same pattern was present. You make a Type II error when you incorrectly say "there is no difference." This is like releasing a guilty criminal by mistakenly finding him innocent.

8. Which error is less desirable - Type I or Type II? Why?

Because statistics (like the justice system) is conservative, you generally worry more about making a Type I error than about making a Type II error. It is better to mistakenly conclude that your data doesn't show anything when it actually does, than it is to conclude that it does show something when it actually doesn't.

Think about this -- if you are doing medical research and your data seems to indicate that a drug may be helpful in treating a serious disease, it is better to conclude mistakenly that the drug is not effective when it actually is effective than it is to conclude that the drug is an effective treatment for the disease when it actually isn't. In the case you conclude it is not effective, no one will buy it and take it for the disease. This means that a few people will have to use other medications, which is a problem. On the other hand, if you conclude it works when it doesn't, many people may waste money on a useless drug, they may get sicker because the medication they are taking doesn't work, and they may suffer unpleasant side effects of the drug that doesn't help improve their condition, which is a bigger problem.

9. Under what conditions do you fail to reject the null hypothesis?

If the difference in your data is smaller than the critical value, it means that the probability of getting a difference this large when HØ is true is greater than five percent, so you fail to reject the null hypothesis.

10. Does a failure to reject the null hypothesis mean that you should give up your career as a researcher and go back to your part time job as a barista in a trendy espresso bar?

No. It means you should look for another hypothesis or another theory.

11. What are you left with when you do reject the null hypothesis?

You are left with the knowledge that what you saw in your data is probably not due to sampling variability, and that it probably reflects a real pattern that exists in the population your sample represents.

12. What does it mean when you say that the pattern you see in your data is "statistically significant"?

It means that what you saw in your data is probably not due to sampling variability, and that it probably reflects a real pattern that exists in the population your sample represents.

It means that the difference you saw in your data is so big that it probably isn't due to sampling variability.