Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The PRINQUAL Procedure

PROC PRINQUAL Statement

PROC PRINQUAL < options > ;
The PROC PRINQUAL statement starts the PRINQUAL procedure. Optionally, this statement identifies an input data set, creates an output data set, specifies the algorithm and other computational details, and controls displayed output.

The following table summarizes options available in the PROC PRINQUAL statement.

Task Option
Identify input data set 
specifies input SAS data setDATA=
Specify details for output data set 
outputs approximations to transformed variablesAPPROXIMATIONS
specifies prefix for approximation variablesAPREFIX=
outputs correlations and component structure matrixCORRELATIONS
specifies a multidimensional preference analysisMDPREF
specifies output data setOUT=
specifies prefix for principal component scores variablesPREFIX=
replaces raw data with transformed dataREPLACE
outputs principal component scoresSCORES
standardizes principal component scoresSTANDARD
specifies transformation standardizationTSTANDARD=
specifies prefix for transformed variablesTPREFIX=
Control iterative algorithm 
analyzes covariancesCOVARIANCE
initializes using dummy variablesDUMMY
specifies iterative algorithmMETHOD=
specifies number of principal componentsN=
suppresses numerical error checkingNOCHECK
specifies number of MGV models before refreshingREFRESH=
restarts iterationsREITERATE
specifies singularity criterionSINGULAR=
specifies input observation typeTYPE=
Control the number of iterations 
specifies minimum criterion changeCCONVERGE=
specifies number of first iteration to be displayedCHANGE=
specifies minimum data changeCONVERGE=
specifies number of MAC initialization iterationsINITITER=
specifies maximum number of iterationsMAXITER=
Specify details for handling missing values 
includes monotone special missing valuesMONOTONE=
excludes observations with missing valuesNOMISS
unties special missing valuesUNTIE=
Suppress displayed output 
suppresses displayed outputNOPRINT


The following list describes these options in alphabetical order.

APREFIX=name
APR=name
specifies a prefix for naming the approximation variables. By default, APREFIX=A. Specifying the APREFIX= option also implies the APPROXIMATIONS option.

APPROXIMATIONS
APPROX
APP
includes principal component approximations to the transformed variables (Eckart and Young 1936) in the output data set. Variable names are constructed from the value of the APREFIX= option and the input variable names. If you specify the APREFIX= option, then approximations are automatically included. If you specify the APPROXIMATIONS option and not the APREFIX= option, then the APPROXIMATIONS option uses the default, APREFIX=A, to construct the variable names.

CCONVERGE=n
CCO=n
specifies the minimum change in the criterion being optimized that is required to continue iterating. By default, CCONVERGE=0.0. The CCONVERGE= option is ignored for METHOD=MAC. For the MGV method, specify CCONVERGE=-2 to ensure data convergence.

CHANGE=n
CHA=n
specifies the number of the first iteration to be displayed in the iteration history table. The default is CHANGE=1. When you specify a larger value for n, the first n-1 iterations are not displayed, thus speeding up the analysis. The CHANGE= option is most useful with the MGV method, which is much slower than the other methods.

CONVERGE=n
CON=n
specifies the minimum average absolute change in standardized variable scores that is required to continue iterating. By default, CONVERGE=0.00001. Average change is computed over only those variables that can be transformed by the iterations, that is, all LINEAR, OPSCORE, MONOTONE, UNTIE, SPLINE, MSPLINE, and SSPLINE variables and nonoptimal transformation variables with missing values. For more information, see the section "Optimal Transformations".

COVARIANCE
COV
computes the principal components from the covariance matrix. The variables are always centered to mean zero. If you do not specify the COVARIANCE option, the variables are also standardized to variance one, which means the analysis is based on the correlation matrix.

CORRELATIONS
COR
includes correlations and the component structure matrix in the output data set. By default, this information is not included.

DATA=SAS-data-set
specifies the SAS data set to be analyzed. The data set must be an ordinary SAS data set; it cannot be a TYPE=CORR or TYPE=COV data set. If you omit the DATA= option, the PRINQUAL procedure uses the most recently created SAS data set.

DUMMY
DUM
expands variables specified for OPSCORE optimal transformations to dummy variables for the initialization (Tenenhaus and Vachette 1977). By default, the initial values of OPSCORE variables are the actual data values. The dummy variable nominal initialization requires considerable time and memory, so it might not be possible to use the DUMMY option with large data sets. No separate report of the initialization is produced. Initialization results are incorporated into the first iteration displayed in the iteration history table. For details, see the section "Optimal Transformations".

INITITER=n
INI=n
specifies the number of MAC iterations required to initialize the data before starting MTV or MGV iterations. By default, INITITER=0. The INITITER= option is ignored if METHOD=MAC.

MAXITER=n
MAX=n
specifies the maximum number of iterations. By default, MAXITER=30.

MDPREF
MDP
specifies a multidimensional preference analysis by implying the STANDARD, SCORES, and CORRELATIONS options. This option also suppresses warnings when there are more variables than observations.

METHOD=MAC | MGV | MTV
MET=MAC | MGV | MTV
specifies the optimization method. By default, METHOD=MTV. Values of the METHOD= option are MTV for maximum total variance, MGV for minimum generalized variance, or MAC for maximum average correlation. You can use the MAC method when all variables are positively correlated or when no monotonicity constraints are placed on any transformations. See the section "The Three Methods of Variable Transformation".

MONOTONE=two-letters
MON=two-letters
specifies the first and last special missing value in the list of those special missing values to be estimated using within-variable order and category constraints. By default, there are no order constraints on missing value estimates. The two-letters value must consist of two letters in alphabetical order. For example, MONOTONE=DF means that the estimate of .D must be less than or equal to the estimate of .E, which must be less than or equal to the estimate of .F; no order constraints are placed on estimates of ._, .A through .C, and .G through .Z. For details, see the "Missing Values" section, and "Optimal Scaling" in Chapter 65, "The TRANSREG Procedure."

N=n
specifies the number of principal components to be computed. By default, N=2.

NOCHECK
NOC
turns off computationally intensive numerical error checking for the MGV method. If you do not specify the NOCHECK option, the procedure computes R2 from the squared length of the predicted values vector and compares this value to the R2 computed from the error sum of squares that is a by-product of the sweep algorithm (Goodnight 1978). If the two values of R2 differ by more than the square root of the value of the SINGULAR= option, a warning is displayed, the value of the REFRESH= option is halved, and the model is refit after refreshing. Specifying the NOCHECK option slightly speeds up the algorithm. Note that other less computationally intensive error checking is always performed.

NOMISS
NOM
excludes all observations with missing values from the analysis, but does not exclude them from the OUT= data set. If you omit the NOMISS option, PROC PRINQUAL simultaneously computes the optimal transformations of the nonmissing values and estimates the missing values that minimize squared error.

Casewise deletion of observations with missing values occurs when you specify the NOMISS option, when there are missing values in IDENTITY variables, when there are weights less than or equal to 0, or when there are frequencies less than 1. Excluded observations are output with a blank value for the _TYPE_ variable, and they have a weight of 0. They do not contribute to the analysis but are scored and transformed as supplementary or passive observations. See the "Passive Observations" section and the "Missing Values" section for more information on excluded observations and missing data.

NOPRINT
NOP
suppresses the display of all output. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 15, "Using the Output Delivery System."

OUT=SAS-data-set
specifies an output SAS data set that contains results of the analysis. If you omit the OUT= option, PROC PRINQUAL still creates an output data set and names it using the DATAn convention. If you want to create a permanent SAS data set, you must specify a two-level name. (Refer to the discussion in SAS Language Reference: Concepts.) You can use the REPLACE, APPROXIMATIONS, SCORES, and CORRELATIONS options to control what information is included in the output data set. For details, see the "Output Data Set" section.

PREFIX=name
PRE=name
specifies a prefix for naming the principal components. By default, PREFIX=Prin. As a result, the principal component default names are Prin1, Prin2,..., Prinn.

REFRESH=n
REF=n
specifies the number of variables to scale in the MGV method before computing a new inverse. By default, REFRESH=5. PROC PRINQUAL uses the REFRESH= option in the sweep algorithm of the MGV method. Large values for the REFRESH= option make the method run faster but with increased error. Small values make the method run more slowly and with more numerical accuracy.

REITERATE
REI
enables the PRINQUAL procedure to use previous transformations as starting points. The REITERATE option affects only variables that are iteratively transformed (specified as LINEAR, SPLINE, MSPLINE, SSPLINE, UNTIE, OPSCORE, and MONOTONE). For iterative transformations, the REITERATE option requests a search in the input data set for a variable that consists of the value of the TPREFIX= option followed by the original variable name. If such a variable is found, it is used to provide the initial values for the first iteration. The final transformation is a member of the transformation family defined by the original variable, not the transformation family defined by the initialization variable. See the "REITERATE Option Usage" section.

REPLACE
REP
replaces the original data with the transformed data in the output data set. The names of the transformed variables in the output data set correspond to the names of the original variables in the input data set. If you do not specify the REPLACE option, both original variables and transformed variables (with names constructed from the TPREFIX= option and the original variable names) are included in the output data set.

SCORES
SCO
includes principal component scores in the output data set. By default, scores are not included.

SINGULAR=n
SIN=n
specifies the largest value within rounding error of zero. By default, SINGULAR=1E-8. The PRINQUAL procedure uses the value of the SINGULAR= option for checking (1-R2) when constructing full rank matrices of predictor variables, checking denominators before dividing, and so on.

STANDARD
STD
standardizes the principal component scores in the output data set to mean zero and variance one instead of the default mean zero and variance equal to the corresponding eigenvalue. See the SCORES option.

TPREFIX=name
TPR=name
specifies a prefix for naming the transformed variables. By default, TPREFIX=T. The TPREFIX= option is ignored if you specify the REPLACE option.

TSTANDARD=CENTER | NOMISS | ORIGINAL | Z
TST=CEN | NOM | ORI | Z
specifies the standardization of the transformed variables in the OUT= data set. By default, TSTANDARD=ORIGINAL. When the TSTANDARD= option is specified in the PROC statement, it specifies the default standardization for all variables. When you specify TSTANDARD= as a t-option, it overrides the default standardization just for selected variables.

CENTER
centers the output variables to mean zero, but the variances are the same as the variances of the input variables.

NOMISS
sets the means and variances of the transformed variables in the OUT= data set, computed over all output values that correspond to nonmissing values in the input data set, to the means and variances computed from the nonmissing observations of the original variables. The TSTANDARD=NOMISS specification is useful with missing data. When a variable is linearly transformed, the final variable contains the original nonmissing values and the missing value estimates. In other words, the nonmissing values are unchanged. If your data have no missing values, TSTANDARD=NOMISS and TSTANDARD=ORIGINAL produce the same results.

ORIGINAL
sets the means and variances of the transformed variables to the means and variances of the original variables. This is the default.

Z
standardizes the variables to mean zero, variance one.

For nonoptimal variable transformations, the means and variances of the original variables are actually the means and variances of the nonlinearly transformed variables, unless you specify the ORIGINAL nonoptimal t-option in the TRANSFORM statement. For example, if a variable X with no missing values is specified as LOG, then, by default, the final transformation of X is simply LOG(X), not LOG(X) standardized to the mean of X and variance of X.

TYPE='text '|name
TYP='text '|name
specifies the valid value for the _TYPE_ variable in the input data set. If PROC PRINQUAL finds an input _TYPE_ variable, it uses only observations with a _TYPE_ value that matches the TYPE= value. This enables a PROC PRINQUAL OUT= data set containing correlations to be used as input to PROC PRINQUAL without requiring a WHERE statement to exclude the correlations. If a _TYPE_ variable is not in the data set, all observations are used. The default is TYPE='SCORE', so if you do not specify the TYPE= option, only observations with _TYPE_ = 'SCORE' are used.

PROC PRINQUAL displays a note when it reads observations with blank values of _TYPE_, but it does not automatically exclude those observations. Data sets created by the TRANSREG and PRINQUAL procedures have blank _TYPE_ values for those observations that were excluded from the analysis due to nonpositive weights, nonpositive frequencies, or missing data. When these observations are read again, they are excluded for the same reason that they were excluded from their original analysis, not because their _TYPE_ value is blank.

UNTIE=two-letters
UNT=two-letters
specifies the first and last special missing value in the list of those special missing values that are to be estimated with within-variable order constraints but no category constraints. The two-letters value must consist of two letters in alphabetical order. By default, there are category constraints but no order constraints on special missing value estimates. For details, see the "Missing Values" section. Also, see "Optimal Scaling" in Chapter 65, "The TRANSREG Procedure."

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.