Displayed Output
The STEPDISC procedure displays the following output:
- Class Level Information, including the values of the
classification variable, the Frequency of each value,
the Weight of each value, and the Proportion of each value in the
total sample
Optional output includes
- Within-Class SSCP Matrices for each group
- Pooled Within-Class SSCP Matrix
- Between-Class SSCP Matrix
- Total-Sample SSCP Matrix
- Within-Class Covariance Matrices for each group
- Pooled Within-Class Covariance Matrix
- Between-Class Covariance Matrix,
equal to the between-class SSCP matrix divided by
n(c-1)/c, where n is the number of observations
and c is the number of classes
- Total-Sample Covariance Matrix
- Within-Class Correlation Coefficients and to
test the hypothesis that the within-class population
correlation coefficients are zero
- Pooled Within-Class Correlation Coefficients and
to test the hypothesis that the partial
population correlation coefficients are zero
- Between-Class Correlation Coefficients and to
test the hypothesis that the between-class population
correlation coefficients are zero
- Total-Sample Correlation Coefficients and to test the hypothesis that the total population
correlation coefficients are zero
- descriptive Simple Statistics including N (the
number of observations), Sum, Mean, Variance, and Standard
Deviation for the total sample and within each class
- Total-Sample Standardized Class Means,
obtained by subtracting the grand mean from each class
mean and dividing by the total-sample standard deviation
- Pooled Within-Class Standardized Class Means,
obtained by subtracting the grand mean from each class mean
and dividing by the pooled within-class standard deviation
At each step, the following statistics are displayed:
- for each variable considered for entry or removal:
Partial R-Square, the squared (partial) correlation,
the F statistic, and Pr > F, the probability level,
from a one-way analysis of covariance
- the minimum Tolerance for entering each variable. A variable is
entered only if its tolerance and the tolerances for all variables
already in the model are greater than the value specified in the
SINGULAR= option. The tolerance for the entering variable is 1 -
R2 from regressing the entering variable on the other variables
already in the model. The tolerance for a variable already in the
model is 1 - R2 from regressing that variable on the entering
variable and the other variables already in the model. With m
variables already in the model, for each entering variable, m + 1
multiple regressions are performed using the entering variable and
each of the m variables already in the model as a dependent variable.
These m + 1 tolerances are computed for each entering variable, and
the minimum tolerance is displayed for each.
The tolerance is computed using the
total-sample correlation matrix.
It is customary to compute tolerance using the pooled
within-class correlation matrix (Jennrich 1977),
but it is possible for a variable with excellent
discriminatory power to have a high total-sample
tolerance and a low pooled within-class tolerance.
For example, PROC STEPDISC enters a variable that
yields perfect discrimination (that is, produces
a canonical correlation of one), but a program
using pooled within-class tolerance does not.
- the variable Label, if any
- the name of the variable chosen
- the variables already selected or removed
- Wilks' Lambda and the associated F
approximation with degrees of freedom and
Pr < F, the associated probability level after
the selected variable has been entered or removed.
Wilks' lambda is the likelihood ratio statistic for
testing the hypothesis that the means of the classes
on the selected variables are equal in the population
(see the "Multivariate Tests" section in
Chapter 3, "Introduction to Regression Procedures").
Lambda is close to zero if any two groups are well separated.
- Pillai's Trace and the associated F
approximation with degrees of freedom and
Pr > F, the associated probability level after
the selected variable has been entered or removed.
Pillai's trace is a multivariate statistic for testing
the hypothesis that the means of the classes on
the selected variables are equal in the population
(see the "Multivariate Tests" section in Chapter 3).
- Average Squared Canonical Correlation (ASCC).
The ASCC is Pillai's trace divided
by the number of groups minus 1.
The ASCC is close to 1 if all groups are well separated
and if all or most directions in the discriminant
space show good separation for at least two groups.
- Summary to give statistics
associated with the variable chosen at each step.
The summary includes the following:
- -
- Step number
- -
- Variable Entered or Removed
- -
- Number In, the number of variables in the model
- -
- Partial R-Square
- -
- the F Value for entering or removing the variable
- -
- Pr > F, the probability level for the F statistic
- -
- Wilks' Lambda
- -
- Pr < Lambda based on the F
approximation to Wilks' Lambda
- -
- Average Squared Canonical Correlation
- -
- Pr > ASCC based on the F
approximation to Pillai's trace
- -
- the variable Label, if any
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.