Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The STDIZE Procedure

PROC STDIZE Statement

PROC STDIZE < options > ;

The PROC STDIZE statement invokes the procedure. You can specify the following options in the PROC STDIZE statement.

Table 59.1: Summary of PROC STDIZE Statement Options
Task Options Description
Specify standardization methodsMETHOD=specifies the name of the standardization method
 INITIAL=specifies the method for computing initial estimates for the A estimates
Unstandardize variablesUNSTDunstandardizes variables when you also specify the METHOD=IN option
Process missing valuesNOMISSomits observations with any missing values from computation
 MISSING=specifies the method or a numeric value for replacing missing values
 REPLACEreplaces missing data by zero in the standardized data
 REPONLYreplaces missing data by the location measure (does not standardize the data)
Specify data set detailsDATA=specifies the input data set
 OUT=specifies the output data set
 OUTSTAT=specifies the output statistic data set
Specify computational settingsVARDEF=specifies the variances divisor
 NMARKERS=specifies the number of markers when you also specify PCTLMTD=ONEPASS
 MULT=specifies the constant to multiply each value by after standardizing
 ADD=specifies the constant to add to each value after standardizing and multiplying by the value specified in the MULT= option
 FUZZ=specifies the relative fuzz factor for writing the output
Specify percentilesPCTLDEF=specifies the definition of percentiles when you also specify the PCTLMTD=ORD_STAT option
 PCTLMTD=specifies the method used to estimate percentiles
 PCTLPTS=writes observations containing percentiles to the data set specified in the OUTSTAT= option
Normalize scale estimatorsNORMnormalizes the scale estimator to be consistent for the standard deviation of a normal distribution
 SNORMnormalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution
Specify outputPSTATdisplays the location and scale measures


These options and their abbreviations are described, in alphabetical order, in the remainder of this section.
ADD= c
specifies a constant, c, to add to each value after standardizing and multiplying by the value you specify in the MULT= option. The default value is 0.

DATA=SAS-data-set
specifies the input data set to be standardized. If you omit the DATA= option, the most recently created data set is used.

FUZZ=c
specifies the relative fuzz factor. The default value is 1E-14. For the OUT= data set, the score is computed as follows:
{if} |{Result}| \lt {Scale} x {Fuzz, then Result} = 0

For the OUTSTAT= data set and the Location and Scale table, the scale and location values are computed as follows:
{if Scale} \lt |{Location}| x {Fuzz, then Scale} = 0
Otherwise,
{if} |{Location}| \lt {Scale} x {Fuzz, then Location} = 0

INITIAL=method
specifies the method for computing initial estimates for the A estimates (ABW, AWAVE, and AHUBER). The following methods are not allowed: INITIAL=ABW, INITIAL=AHUBER, INITIAL=AWAVE, and INITIAL=IN. The default is INITIAL=MAD.

METHOD=name
specifies the name of the method for computing location and scale measures. Valid values for name are as follows: MEAN, MEDIAN, SUM, EUCLEN, USTD, STD, RANGE, MIDRANGE, MAXABS, IQR, MAD, ABW, AHUBER, AWAVE, AGK, SPACING, L, and IN.

For details on these methods, see the descriptions in the "Standardization Methods" section. The default is METHOD=STD.

MISSING= method
MISSING= value
specifies the method (or a numeric value) for replacing missing values. If you omit the MISSING= option, the REPLACE option replaces missing values with the location measure given by the METHOD= option. Specify the MISSING= option when you want to replace missing values with a different value. You can specify any name that is valid in the METHOD= option except the name IN. The corresponding location measure is used to replace missing values.

If a numeric value is given, the value replaces missing values after standardizing the data. However, you can specify the REPONLY option with the MISSING= option to suppress standardization for cases in which you want only to replace missing values.

MULT= c
specifies a constant, c, by which to multiply each value after standardizing. The default value is 1.

NMARKERS= n
specifies the number of markers used when you specify the one-pass algorithm (PCTLMTD=ONEPASS). The value n must be greater than or equal to 5. The default value is 105.

NOMISS
omits observations with missing values for any of the analyzed variables from calculation of the location and scale measures. If you omit the NOMISS option, all nonmissing values are used.

NORM
normalizes the scale estimator to be consistent for the standard deviation of a normal distribution when you specify the option METHOD=AGK, METHOD=IQR, METHOD=MAD, or METHOD=SPACING.

OUT=SAS-data-set
specifies the name of the SAS data set created by PROC STDIZE. The output data set is a copy of the DATA= data set except that the analyzed variables have been standardized. Note that analyzed variables are those specified in the VAR statement or, if there is no VAR statement, all numeric variables not listed in any other statement. See the section "Output Data Sets" for more information.

If you want to create a permanent SAS data set, you must specify a two-level name. (Refer to "SAS Files" in SAS Language Reference: Concepts for more information on permanent SAS data sets.)

If you omit the OUT= option, PROC STDIZE creates an output data set named according to the DATAn convention.

OUTSTAT=SAS-data-set
specifies the name of the SAS data set containing the location and scale measures and other computed statistics. See the section "Output Data Sets" for more information.

PCTLDEF= percentiles
specifies which of five definitions is used to calculate percentiles when you specify the option PCTLMTD=ORD_STAT. By default, PCTLDEF=5.

Note that the option PCTLMTD=ONEPASS implies a specification of PCTLDEF=5. See the section "Computational Methods for the PCTLDEF= Option" for details on the PCTLDEF= option.

PCTLMTD=ORD_STAT
PCTLMTD=ONEPASS  |  P2
specifies the method used to estimate percentiles. Specify the PCTLMTD=ORD_STAT option to compute the percentiles by the order statistics method.

The PCTLMTD=ONEPASS option modifies an algorithm invented by Jain and Chlamtac (1985). See the "Computing Quantiles" section for more details on this algorithm.

PCTLPTS= n
writes percentiles to the OUTSTAT= data set. Values of n can be any decimal number between 0 and 100, inclusive.

A requested percentile is identified by the _TYPE_ variable in the OUTSTAT= data set with a value of Pn. For example, suppose you specify the option PCTLPTS=10, 30. The corresponding observations in the OUTSTAT= data set that contain the 10th and the 30th percentiles would then have values _TYPE_=P10 and _TYPE_=P30, respectively.

PSTAT
displays the location and scale measures.

REPLACE
replaces missing data with the value 0 in the standardized data (this value corresponds to the location measure before standardizing). To replace missing data by other values, see the preceding description of the MISSING= option. You cannot specify both the REPLACE and REPONLY options.

REPONLY
replaces missing data only; PROC STDIZE does not standardize the data. Missing values are replaced with the location measure unless you also specify the MISSING= value option, in which case missing values are replaced with value. You cannot specify both the REPLACE and REPONLY options.

SNORM
normalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution when you specify the METHOD=SPACING option.

UNSTD
UNSTDIZE
unstandardizes variables when you specify the METHOD=IN(ds) option. The location and scale measures, along with constants for addition and multiplication that the unstandardization is based upon, are identified by the _TYPE_ variable in the ds data set.

The ds data set must have a _TYPE_ variable and contain the following two observations: a _TYPE_= `LOCATION' observation and a _TYPE_= `SCALE' observation. The variable _TYPE_ can also contain the optional observations, `ADD' and `MULT'; if these observations are not found in the ds data set, the constants specified in the ADD= and MULT= options (or their default values) are used for unstandardization.

See the "OUTSTAT= Data Set" section for details on the statistics that each value of _TYPE_ represents. The formula used for unstandardization is as follows: If the final output value from the previous standardization is calculated as
result = add + multiply ×[((original - location))/scale]

then the original value is reconstructed as

original = scale ×[((result - add))/multiply] + location

VARDEF= DF
VARDEF= N
VARDEF= WDF
VARDEF= WEIGHT | WGT
specifies the divisor to be used in the calculation of variances. By default, VARDEF=DF. The values and associated divisors are as follows.

Value Divisor Formula
DFdegrees of freedomn-1
Nnumber of observationsn
WDFsum of weights minus 1(\sum_i{w_i}) - 1
WEIGHT | WGTsum of weights\sum_i{w_i}

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.