Chapter Contents

Previous

Next
The CORR Procedure

Results


Missing Values
By default, PROC CORR uses pairwise deletion when observations contain missing values. PROC CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations. Therefore, the correlations statistics may be based on different numbers of observations.

If you specify the NOMISS option, PROC CORR uses listwise deletion when a value of the BY, FREQ, VAR, WEIGHT, or WITH statement variable is missing. PROC CORR excludes all observations with missing values from the analysis. Therefore, the number of observations for each pair of variables is identical. The PARTIAL statement always excludes the observations with missing values by automatically invoking NOMISS. Listwise deletion is needed to correctly calculate Cronbach's coefficient alpha when data are missing. If a data set contains missing values, when you specify ALPHA use the NOMISS option

There are two reasons to specify NOMISS and, thus, to avoid pairwise deletion. First, NOMISS is computationally more efficient, so you use fewer computer resources. Second, if you use the correlations as input to regression or other statistical procedures, a pairwise-missing correlation matrix leads to several statistical difficulties. Pairwise correlation matrices may not be nonnegative definite, and the pattern of missing values may bias the results.


Procedure Output
By default, PROC CORR prints a report that includes descriptive statistics and correlation statistics for each variable.The descriptive statistics include the number of observations with nonmissing values, the mean, the standard deviation, the minimum, and the maximum. PROC CORR reports the following additional descriptive statistics when you request various correlation statistics:

sum
for Pearson correlation only

median
for nonparametric measures of association

partial variance
for Pearson partial correlation

partial standard deviation
for Pearson partial correlation.

If variable labels are available, PROC CORR labels the variables.

When you specify the CSSCP, SSCP, or COV option, the appropriate sum-of-squares and crossproducts and covariance matrix appears at the top of the correlation report. If the data set contains missing values, PROC CORR prints additional statistics for each pair of variables. These statistics, calculated from the observations with nonmissing row and column variable values, may include

SSCP(W','V')
uncorrected sum-of-squares and crossproducts

USS(W')
uncorrected sum-of-squares for the row variable

USS(V')
uncorrected sum-of-squares for the column variable

CSSCP(W','V')
corrected sum-of-squares and crossproducts

CSS(W')
corrected sum-of-squares for the row variable

CSS(V')
corrected sum-of-squares for the column variable

COV (W','V')
covariance

VAR (W')
variance for the row variable

VAR (V')
variance for the column variable

DF(W',V')
divisor for calculating covariance and variances.

For each pair of variables, PROC CORR always prints the correlation coefficients, the number of observations used to calculate the coefficient, and the significance probability. When you specify the ALPHA option, PROC CORR prints Cronbach's coefficient alpha, the correlation between the variable and the total of the remaining variables, and Cronbach's coefficient alpha using the remaining variables for the raw variables and the standardized variables.


Output Data Sets
When you specify the OUTP=, OUTS=, OUTK=, or OUTH= option, PROC CORR creates an output data set containing statistics for Pearson correlation, Spearman correlation, Kendall correlation, or Hoeffding's D, respectively. By default, the output data set is a special data set type (TYPE=CORR) that many SAS/STAT procedures recognize, including PROC REG and PROC FACTOR. When you specify the NOCORR option and the COV, CSSCP, or SSCP option, use the TYPE= data set option to change the data set type to COV, CSSCP, or SSCP. For example, the following statement

   proc corr nocorr cov outp=b(type=cov);
specifies the output data set type as COV.

PROC CORR does not print the output data set. Use PROC PRINT, PROC REPORT, or another SAS reporting tool to print the output data set.

The output data set includes the following variables

BY variables
identifies the BY group when using a BY statement.

_TYPE_ variable
identifies the type of observation.

_NAME_ variable
identifies the variable that corresponds to a given row of the correlation matrix.

INTERCEP variable
identifies variable sums when specifying the SSCP option.

VAR variables
identifies the variables listed in the VAR statement.

You can use a combination of the _TYPE_ and _NAME_ variables to identify the contents of an observation. The _NAME_ variable indicates which row of the correlation matrix the observation corresponds to. The values of the _TYPE_ variable are

SSCP
uncorrected sums of squares and crossproducts

CSSCP
corrected sums of squares and crossproducts

COV
covariances

MEAN
mean of each variable

STD
standard deviation of each variable

N
number of nonmissing observations for each variable

SUMWGT
sum of the weights for each variable when using a WEIGHT statement

CORR
correlation statistics for each variable.

When you specify the SSCP option, the OUTP= data set includes an additional observation that contains intercept values. When you specify the ALPHA option, the OUTP= data set also includes observations with the following _TYPE_ values:

RAWALPHA
Cronbach's coefficient alpha for raw variables

STDALPHA
Cronbach's coefficient alpha for standardized variables

RAWALDEL
Cronbach's coefficient alpha for raw variables after deleting one variable

STDALDEL
Cronbach's coefficient alpha for standardized variables after deleting one variable

RAWCTDEL
correlation between a raw variable and the total of the remaining raw variables

STDCTDEL
correlation between a standardized variable and the total of the remaining standardized variables.

When you use a PARTIAL statement, the previous statistics are calculated for the variables after partialling. If PROC CORR computes Pearson correlation statistics, MEAN equals zero and STD equals the partial standard deviation associated with the partial variance for the OUTP=, OUTK=, or OUTS= data set. Otherwise, PROC CORR assigns missing values to MEAN and STD. OUTP= Data Set with Pearson Partial Correlations lists the observations in an OUTP= data set when the COV option and PARTIAL statement are used to compute Pearson partial correlations. The _TYPE_ variable identifies COV, MEAN, STD, N, and CORR as the statistical values for the variables Weight, Oxygen, and Runtime. MEAN always equals 0, while STD is a partial standard deviation.

OUTP= Data Set with Pearson Partial Correlations
   Pearson Correlation Statistics Using the PARTIAL Statement  1
                 Output Data Set from PROC CORR

     _TYPE_    _NAME_       Weight      Oxygen     Runtime

      COV      Weight      72.4374    -12.7511      2.0677
      COV      Oxygen     -12.7511     27.0165     -5.5937
      COV      Runtime      2.0677     -5.5937      1.9451
      MEAN                  0.0000      0.0000      0.0000
      STD                   8.5110      5.1977      1.3947
      N                    28.0000     28.0000     28.0000
      CORR     Weight       1.0000     -0.2882      0.1742
      CORR     Oxygen      -0.2882      1.0000     -0.7716
      CORR     Runtime      0.1742     -0.7716      1.0000


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.