Canonical Correlation Analysis

Multivariate Analyses

Canonical Correlation Analysis

Canonical correlation was developed by Hotelling (1935, 1936). Its application is discussed by Cooley and Lohnes (1971), Kshirsagar (1972), and Mardia, Kent, and Bibby (1979). It is a technique for analyzing the relationship between two sets of variables. Each set can contain several variables. Multiple and simple correlation are special cases of canonical correlation in which one or both sets contain a single variable, respectively.

Given two sets of variables, canonical correlation analysis finds a linear combination from each set, called a canonical variable, such that the correlation between the two canonical variables is maximized. This correlation between the two canonical variables is the first canonical correlation. The coefficients of the linear combinations are canonical coefficients or canonical weights. It is customary to normalize the canonical coefficients so that each canonical variable has a variance of 1.

The first canonical correlation is at least as large as the multiple correlation between any variable and the opposite set of variables. It is possible for the first canonical correlation to be very large while all the multiple correlations for predicting one of the original variables from the opposite set of canonical variables are small.

Canonical correlation analysis continues by finding a second set of canonical variables, uncorrelated with the first pair, that produces the second highest correlation coefficient. The process of constructing canonical variables continues until the number of pairs of canonical variables equals the number of variables in the smaller group.

Each canonical variable is uncorrelated with all the other canonical variables of either set except for the one corresponding canonical variable in the opposite set. The canonical coefficients are not generally orthogonal, however, so the canonical variables do not represent jointly perpendicular directions through the space of the original variables.

The canonical correlation analysis includes tests of a series of hypotheses that each canonical correlation and all smaller canonical correlations are zero in the population. SAS/INSIGHT software uses an F approximation (Rao 1973; Kshirsagar 1972) that gives better small sample results than the usual $\chi^2$ approximation. At least one of the two sets of variables should have an approximately multivariate normal distribution in order for the probability levels to be valid.

Canonical redundancy analysis (Stewart and Love 1968; Cooley and Lohnes 1971; van den Wollenberg 1977) examines how well the original variables can be predicted from the canonical variables. The analysis includes the proportion and cumulative proportion of the variance of the set of Y and the set of X variables explained by their own canonical variables and explained by the opposite canonical variables. Either raw or standardized variance can be used in the analysis.

Chapter Contents
Previous
Next
Top