Chapter 13. Strength of relationships: Continuous data

1. Compare Pearson's r and covariance in terms of how they are calculated and discuss the difference between the two.

Pearson's r involves a modification of covariance. Instead of using the sum of the cross-products of the deviation scores, you use the sum of the crossproducts of the z-scores. While using deviation scores centers your data and removes the effect of the mean (by subtracting it from each value), converting the deviation scores to z-scores removes the effects of the original units of measurement and the degree of dispersion of the variable.

2. What are the consequences of the difference between how Pearson's r and covariance are calculated?

Changing from deviation scores to z-scores produces an important result. The value you get now is the Pearson product-moment correlation coefficient, more commonly known as the "correlation." The correlation can have values between -1.0 and 1.0. If the correlation is 1.0, the two variables are perfectly correlated with one another. They are in effect interchangeable, for when you know the value of one, you also know the value of the other. If the correlation is -1.0, they are perfectly correlated, but the relationship is negative—when one is high, the other is low. If the correlation is 0.0, there is no relationship at all between the two variables; knowing the value of one tells you nothing about the value of the other.

3. What useful properties are characteristic of Pearson's r but not of covariance?

Correlation is better than covariance for these reasons:

1 -- Because correlation removes the effect of the variance of the variables, it provides a standardized, absolute measure of the strength of the relationship, bounded by -1.0 and 1.0. This is good bucause it makes it possible to compare any correlation to any other correlation and see which is stronger. You cannot do this with covariance.

2 -- The squared correlation (r2) is a measure of how much of the variance in one variable is explained by the other variable. This measure, the coefficient of determination, ranges from 0.0 to 1.0. You cannot do this with covariance.

4. Consider the following table which shows the scores twelve students received on two exams:

 Exam 1 Exam 2 Exam 3 Exam 4 a 42.66 81.68 a 83.68 45.51 b 85.62 49.51 b 45.51 92.99 c 75.30 75.39 c 68.39 55.88 d 43.46 77.61 d 75.61 75.61 e 85.74 31.82 e 35.82 83.68 f 41.72 84.59 f 87.59 63.45 g 85.53 61.88 g 55.88 >72.35 h 85.39 63.45 h 63.45 87.59 i 43.25 79.94 i 79.94 >66.79 l 85.48 70.79 l 66.79 68.39 k 84.02 74.35 k 72.35 79.94 l 41.54 96.99 l 92.99 35.82

Make a scatterplot for the exam scores with the X-axis for Exam 1 and the Y-axis for Exam 2. Do another one for Exam 3 and Exam 4.

5. Calculate the variance and standard deviation for the scores of the first two exams.

 exam 1 exam 2 variance 461.447 296.738 std. dev. 21.481 17.226

6. Calculate the covariance between the scores on the first two exams.

cov = -248.9340

7. Calculate Pearson's r between the scores on the first two exams.

r = -0.7339

8. Calculate Spearman's rho between the scores on the first two exams.

 Exam 1 Rank 1 Exam 2 Rank 2 d d×d 42.7 3.00 81.7 10.00 -7 49 85.6 11.00 49.5 2.00 9 81 75.3 6.00 75.4 7.00 -1 1 43.5 5.00 77.6 8.00 -3 9 85.7 12.00 31.8 1.00 11 121 41.7 2.00 84.6 11.00 -9 81 85.5 10.00 61.9 3.00 7 49 85.4 8.00 63.5 4.00 4 16 43.3 4.00 79.9 9.00 -5 25 85.5 9.00 70.8 5.00 4 16 84.0 7.00 74.3 6.00 1 1 41.5 1.00 97.0 12.00 -11 121 570

rho = -0.993 This is a VERY strong relationship!

9. Discuss the difference between the covariance and the Pearson's r.

Pearson's r involves a modification of covariance. Instead of using the sum of the cross-products of the deviation scores, you use the sum of the crossproducts of the z-scores. While using deviation scores centers your data and removes the effect of the mean (by subtracting it from each value), converting the deviation scores to z-scores removes the effects of the original units of measurement and the degree of dispersion of the variable.

Changing from deviation scores to z-scores produces an important result. The value you get now is the Pearson product-moment correlation coefficient, more commonly known as the "correlation." The correlation can have values between -1.0 and 1.0. If the correlation is 1.0, the two variables are perfectly correlated with one another. They are in effect interchangeable, for when you know the value of one, you also know the value of the other. If the correlation is -1.0, they are perfectly correlated, but the relationship is negative—when one is high, the other is low. If the correlation is 0.0, there is no relationship at all between the two variables; knowing the value of one tells you nothing about the value of the other.

10. Discuss the difference between the Pearson's r and the Spearman's rho (from Ch. 12).

Pearson's r requires interval or ratio data while Spearman's rho only requires ordinal data. Spearman's rho is based only on the ranks of the values, rather than the values themselves. Spearman's rho is essentially the Pearson's r of the ranks of the values of the data. A limiting factor for rho is that it is affected by ties. This will be especially likely to happen when your variables have a small number of possible values and you have a large number of cases. (This is why Spearman's rho is better for situations in which your variables have a wide range of possible values.) When there are few ties, rho will be very close to the value you would get with Pearson's r. If there are many ties, the divergence betweenrSpearman's rho and Pearson's r will be larger.

11. Why is one higher than the other? (the scatterplot may help with this)

The relationship between scores on the first two exams is curvilinear. Since Pearson's r is a linear correlation, it assumes the relation would be a straight line on the scatterplot and this most of the data points are not very close to the line. Since Spearman's rho is based only on the ranks of the values, rather than the values themselves, the curvilinearity is largely canceled for the relationship in this data. The reason this happens is that ranks are ordinal scaled and the only information they tell is that one value is larger or smaller than another; information about the size of the differences in values is lost when the ranks of the values are used instead of the values themselves.

12. What does this show about situations in which Spearman's rho may be more useful than Pearson's r?

The answer to question 11 indicates that Spearman's rho may be more useful than Pearson's r when the relationship is curvilinear, since the curvilinearity causes Pearson's r to underestimate the strength of the relatinship.

13. Subtract 30.0 from the scores for Exam 2 and recalculate covariance, Pearson's r, and Spearman's rho. What effect did this transformation on the data have on the results? Explain why this happened.

This transformation has no effect on any of the measures because it does not affect the variance of either variable and it has no effect on the relative sizes of the differences between the values of each variable.

14. Multiply the scores for both exams by 0.5 and recalculate covariance, Pearson's r, and Spearman's rho. What effect did this transformation on the data have on the results? Explain why this happened.

Only the covariance changed. r is standardized by use of z-scored which removes effect of variances of the variables. Cov isn't standardized in this fashion. Multiplying the scores by 0.5 reduces the variance by a proportional amount and this has a large effect on the covariance. Because covariance uses the cross-products of the deviation scores, multiplying one variable by a constant multiplies the deviation scores for that variable by the same constant (devnew = devold x constant) and thus multiplies the covariance by that constant.

If you multiply both variables by the same constant, this multiplies both deviation scores by that constant. When you calculate the cross-products of the deviation scores, both are multiplied by the constant, which is the same as multiplying the cross-product of the original deviation scores by the square of the constant. This is the same as multiplying the original covariance by the square of the constant. If the constant is 0.5, the new covariance will be 0.5 x 0.5 times the old covariance, which is 0.25 times the old covariance.

15. What are the implications for the results you obtained in Questions 13 and 14? Are there particular situations in which you use take advantage of these results?

16. Calculate Pearson's r between the scores on the third and fourth exams.

r = -0.7491

17. Calculate Spearman's rho between the scores on third and fourth exams.

rho = -.78322

18. Compare the results from questions 16 and 17 to those from questions 7 and 8. Explain why the Pearson's r is almost the same while the Spearman's rho is different. The scatterplots may help you.

Here are the two scatterplots with lines drawn on them to show the relationships between the variables:

 1/2 3/4 r = -0.7339 r = -0.7491 rho = -0.993 rho = -.78322

First, you can see that the 1/2 relationship is curvilinear and the 3/4 one is linear. Second, the 1/2 relationship is very "clean" in the sense that all of the data points are very close to the line that summarizes the relationship, while in the 3/4 relationship few points are right on the line. (There is no way to draw a smooth line that goes through all of the data points in the 3/4 relationship. It would have to look like the roller coaster at the PNE!) The very strong Spearman's rho is a direct reflection of the very clean nature of the relationship on the scatterplot. Third,you will see that the straight red lines drawn on both scatterplots are almost equally good (or bad) at providing a summary of the relationship between the variables on the plots.

Together, these factors will lead to a higher Spearman's rho and a lower Pearson's r for the 1/2 relationship, while for the 3/4 relationship, Spearman's rho is close to Pearson's r. You can see the importance of examining a scatterplot of your data as well as doing the calculations. Without the scatterplots, you wouldn't have a clue why you got the numbers you saw for this data.