The MEANS Procedure

PROC MEANS Statement

See also:

PROC MEANS <option(s)> <statistic-keyword(s)>;

To do this Use this option

Specify the input data set DATA=

Disable floating point exception recovery NOTRAP

Specify the amount of memory to use for data summarization with class variables SUMSIZE=

Control the classification levels

Specify a secondary data set that contains the combinations of class variables to analyze CLASSDATA=

Create all possible combinations of class variable values COMPLETETYPES

Exclude from the analysis all combinations of class variable values that are not in the CLASSDATA= data set EXCLUSIVE

Use missing values as valid values to create combinations of class variables MISSING

Control the statistical analysis

Specify the confidence level for the confidence limits ALPHA=

Exclude observations with nonpositive weights from the analysis EXCLNPWGTS

Specify the sample size to use for the P2 quantile estimation method QMARKERS=

Specify the quantile estimation method QMETHOD=

Specify the mathematical definition used to compute quantiles QNTLDEF=

Select the statistics statistic-keyword

Specify the variance divisor VARDEF=

Control the output

Specify the field width for the statistics FW=

Specify the number of decimal places for the statistics MAXDEC=

Suppress reporting the total number of observations for each unique combination of the class variables NONOBS

Suppress all displayed output NOPRINT

Order the values of the class variables according to the specified order ORDER=

Display the output PRINT

Display the analysis for all requested combinations of class variables PRINTALLTYPES

Display the values of the ID variables PRINTIDVARS

Control the output data set

Specify that the _TYPE_ variable contain character values. CHARTYPE

Order the output data set by descending _TYPE_ value DESCENDTYPES

Select ID variables based on minimum values IDMIN

Limit the output statistics to the observations with the highest _TYPE_ value NWAY

To do this	Use this option
Specify the input data set	DATA=
Disable floating point exception recovery	NOTRAP
Specify the amount of memory to use for data summarization with class variables	SUMSIZE=
Control the classification levels
	Specify a secondary data set that contains the combinations of class variables to analyze	CLASSDATA=
	Create all possible combinations of class variable values	COMPLETETYPES
	Exclude from the analysis all combinations of class variable values that are not in the CLASSDATA= data set	EXCLUSIVE
	Use missing values as valid values to create combinations of class variables	MISSING
Control the statistical analysis
	Specify the confidence level for the confidence limits	ALPHA=
	Exclude observations with nonpositive weights from the analysis	EXCLNPWGTS
	Specify the sample size to use for the P2 quantile estimation method	QMARKERS=
	Specify the quantile estimation method	QMETHOD=
	Specify the mathematical definition used to compute quantiles	QNTLDEF=
	Select the statistics	statistic-keyword
	Specify the variance divisor	VARDEF=
Control the output
	Specify the field width for the statistics	FW=
	Specify the number of decimal places for the statistics	MAXDEC=
	Suppress reporting the total number of observations for each unique combination of the class variables	NONOBS
	Suppress all displayed output	NOPRINT
	Order the values of the class variables according to the specified order	ORDER=
	Display the output	PRINT
	Display the analysis for all requested combinations of class variables	PRINTALLTYPES
	Display the values of the ID variables	PRINTIDVARS
Control the output data set
	Specify that the _TYPE_ variable contain character values.	CHARTYPE
	Order the output data set by descending _TYPE_ value	DESCENDTYPES
	Select ID variables based on minimum values	IDMIN
	Limit the output statistics to the observations with the highest _TYPE_ value	NWAY

Options

ALPHA=value

specifies the confidence level to compute the confidence limits for the mean. The percentage for the confidence limits is (1-value)×100. For example, ALPHA=.05 results in a 95% confidence limit.

Default:	.05
Range:	between 0 and 1
Interaction:	To compute confidence limits specify the statistic-keyword CLM, LCLM, or UCLM.
See also:	Confidence Limits
Featured in:	Computing a Confidence Limit for the Mean

CHARTYPE

specifies that the _TYPE_ variable in the output data set is a character representation of the binary value of _TYPE_. The length of the variable equals the number of class variables.

Main discussion:	Output Data Set
Interaction	When you specify more than 32 class variables, _TYPE_ automatically becomes a character variable.
Featured in:	Computing Output Statistics with Missing Class Variable Values

CLASSDATA=SAS-data-set

specifies a data set that contains the combinations of values of the class variables that must be present in the output. Any combinations of values of the class variables that occur in the CLASSDATA= data set but not in the input data set appear in the output and have a frequency of zero.

Restriction:	The CLASSDATA= data set must contain all class variables. Their data type and format must match the corresponding class variables in the input data set.
Interaction:	If you use the EXCLUSIVE option, PROC MEANS excludes any observation in the input data set whose combination of class variables is not in the CLASSDATA= data set.
Tip:	Use the CLASSDATA= data set to filter or to supplement the input data set.
Featured in:	Using a CLASSDATA= Data Set with Class Variables

COMPLETETYPES

creates all possible combinations of class variables even if the combination does not occur in the input data set.

Interaction:	The PRELOADFMT option in the CLASS statement ensures that PROC MEANS ouputs all user-defined format ranges or values for the combinations of class variables, even when a frequency is zero.
Tip:	Using COMPLETETYPES does not increase the memory requirements.
Featured in:	Using Preloaded Formats with Class Variables

DATA=SAS-data-set

identifies the input SAS data set.

Main discussion:

Input Data Sets

DESCENDTYPES

orders observations in the output data set by descending _TYPE_ value.

Alias:	DESCENDING \| DESCEND
Interaction:	Descending has no effect if you specify NWAY.
Tip:	Use DESCENDTYPES to make the overall total (_TYPE_=0) the last observation in each BY group.
See also:	Output Data Set
Featured in:	Computing Different Output Statistics for Several Variables

EXCLNPWGTS

excludes observations with nonpositive weight values (zero or negative) from the analysis. By default, PROC MEANS treats observations with negative weights like those with zero weights and counts them in the total number of observations.

Alias:	EXCLNPWGT
See also:	WEIGHT= and WEIGHT Statement

EXCLUSIVE

excludes from the analysis all combinations of the class variables that are not found in the CLASSDATA= data set.

Requirement:	If a CLASSDATA= data set is not specified, this option is ignored.
Featured in:	Using a CLASSDATA= Data Set with Class Variables

FW=field-width

specifies the field width to display the statistics in the output.

Default:	12
Tip:	If PROC MEANS truncates column labels in the output, increase the field width.
Featured in:	Computing Specific Descriptive Statistics , Using a CLASSDATA= Data Set with Class Variables , and Using Multi-label Value Formats with Class Variables

IDMIN

specifies that the output data set contain the minimum value of the ID variables.

Interaction:	Specify PRINTIDVARS to display the value of the ID variables in the output.
See:	ID Statement

MAXDEC=number

specifies the maximum number of decimal places to display the statistics in the output.

Default:	BEST. width for columnar format, typically about 7. (This does not apply to the PROBT statistic. The SAS system option PROBSIG= determines its format. See SAS system options in SAS Language Reference: Concepts for details.)
Range:	0-8
Featured in:	Computing Descriptive Statistics with Class Variables and Using a CLASSDATA= Data Set with Class Variables

MISSING

considers missing values as valid values to create the combinations of class variables. Special missing values that represent numeric values (the letters A through Z and the underscore (_) character) are each considered as a separate value.

Default:	If you omit MISSING, PROC MEANS excludes the observations with a missing class variable value from the analysis.
See also:	SAS Language Reference: Concepts for a discussion of missing values that have special meaning.
Featured in:	Using Preloaded Formats with Class Variables

NONOBS

suppresses the column that displays the total number of observations for each unique combination of the values of the class variables. This column corresponds to the _FREQ_ variable in the output data set.

See also:	The N Obs Statistic
Featured in:	Using Multi-label Value Formats with Class Variables and Using Preloaded Formats with Class Variables

NOPRINT

See PRINT | NOPRINT.

NOTRAP

disables floating point exception (FPE) recovery during data processing. By default, PROC MEANS traps these errors and sets the statistic to missing.

In operating environments where the overhead of FPE recovery is significant, NOTRAP can improve performance. Note that normal SAS System FPE handling is still in effect so that PROC MEANS terminates in the case of math exceptions.

NWAY

specifies that the output data set contain only statistics for the observations with the highest _TYPE_ and _WAY_ values. When you specify class variables, this corresponds to the combination of all class variables.

Interaction:	If you specify a TYPES statement or a WAYS statements, PROC MEANS ignores this option.
See also:	Output Data Set
Featured in:	Computing Output Statistics with Missing Class Variable Values

ORDER=DATA | FORMATTED | FREQ | UNFORMATTED

specifies the sort order to create the unique combinations for the values of the class variables in the output, where

DATA

orders values according to their order in the input data set.

Interaction:

If you use PRELOADFMT in the CLASS statement, the order for the values of each class variable matches the order that PROC FORMAT uses to store the values of the associated user-defined format. If you use the CLASSDATA= option, PROC MEANS uses the order of the unique values of each class variable in the CLASSDATA= data set to order the output levels. If you use both options, PROC MEANS first uses the user-defined formats to order the output. If you omit EXCLUSIVE, PROC MEANS appends after the user-defined format and the CLASSDATA= values the unique values of the class variables in the input data set based on the order that they are encountered.

Tip:

By default, PROC FORMAT stores a format definition in sorted order. Use the NOTSORTED option to store the values or ranges of a user defined format in the order that you define them.

FORMATTED

orders values by their ascending formatted values. This order depends on your operating environment.

Alias:

FMT | EXTERNAL

FREQ

orders values by descending frequency count so that levels with the most observations are listed first.

Interaction:	For multiway combinations of the class variables, PROC MEANS determines the order of a class variable combination from the individual class variable frequencies.
Interaction:	Use the ASCENDING option in the CLASS statement to order values by ascending frequency count.

UNFORMATTED

orders values by their unformatted values, which yields the same order as PROC SORT. This order depends on your operating environment.

Alias:

UNFMT | INTERNAL

Default:	UNFORMATTED
See also:	Ordering the Class Values

PRINT | NOPRINT

specifies whether PROC MEANS displays the statistical analysis. NOPRINT suppresses all the output.

Default:	PRINT
Tip:	Use NOPRINT when you want to create only an OUT= output data set.
Featured in:	For an example of NOPRINT, see Computing Output Statistics and Identifying the Top Three Extreme Values with the Output Statistics

PRINTALLTYPES

displays all requested combinations of class variables (all _TYPE_ values) in the output. Normally, PROC MEANS shows only the NWAY type.

Alias:	PRINTALL
Interaction:	If you use the NWAY option, the TYPES statement, or the WAYS statement, PROC MEANS ignores this option.
Featured in:	Using a CLASSDATA= Data Set with Class Variables

PRINTIDVARS

displays the values of the ID variables in output.

Alias:	PRINTIDS
Interaction:	Specify IDMIN to display the minimum value of the ID variables.
See:	ID Statement

QMARKERS=number

specifies the default number of markers to use for the P² quantile estimation method. The number of markers controls the size of fixed memory space.

Default:	The default value depends on which quantiles you request. For the median (P50), *number* is 7. For the quartiles (P25 and P50), *number* is 25. For the quantiles P1, P5, P10, P90, P95, or P99, *number* is 105. If you request several quantiles, PROC MEANS uses the largest value of *number*.
Range:	an odd integer greater than 3
Tip:	Increase the number of markers above the defaults settings to improve the accuracy of the estimate; reduce the number of markers to conserve memory and computing time.
Main Discussion	Quantiles

QMETHOD=OS|P2

specifies the method PROC MEANS uses to process the input data when it computes quantiles. If the number of observations is less than or equal to the QMARKERS= value and QNTLDEF=5, both methods produce the same results.

OS

uses order statistics. This is the same method that PROC UNIVARIATE uses.

Note: This technique can be very memory-intensive. [cautionend]

P2

uses the P² method to approximate the quantile.

Default:	OS
Restriction:	When QMETHOD=P2, PROC MEANS will not compute weighted quantiles.
Tip:	When QMETHOD=P2, reliable estimations of some quantiles (P1,P5,P95,P99) may not be possible for some data sets.
Main Discussion:	Quantiles

QNTLDEF=1|2|3|4|5

specifies the mathematical definition that PROC MEANS uses to calculate quantiles when QMETHOD=OS. To use QMETHOD=P2, you must use QNTLDEF=5.

Default:	5
Alias:	PCTLDEF=
Main discussion:	Calculating Percentiles

statistic-keyword(s)

specifies which statistics to compute and the order to display them in the output. The available keywords in the PROC statement are

Descriptive statistic keywords

CLM RANGE

CSS SKEWNESS|SKEW

CV STDDEV|STD

KURTOSIS|KURT STDERR

LCLM SUM

MAX SUMWGT

MEAN UCLM

MIN USS

N VAR

NMISS

Quantile statistic keywords

MEDIAN|P50 Q3|P75

P1 P90

P5 P95

P10 P99

Q1|P25 QRANGE

Hypothesis testing keyword

PROBT T

Default: N, MEAN, STD, MIN, and MAX
Requirement: To compute standard error, confidence limits for the mean, and the Student's t test you must use the default value of VARDEF= which is DF. To compute skewness or kurtosis you must use VARDEF=N or VARDEF=DF.
Tip: Use CLM or both LCLM and UCLM to compute a two-sided confidence limit for the mean. Use only LCLM or UCLM, to compute a one-sided confidence limit.
Main discussion: The definitions of the keywords and the formulas for the associated statistics are listed in Keywords and Formulas .
Featured in: Computing Specific Descriptive Statistics and Using the BY Statement with Class Variables

SUMSIZE=value

specifies the amount of memory that is available for data summarization when you use class variables. value may be one of the following:

n|nK| nM| nG: specifies the amount of memory available in bytes, kilobytes, megabytes, or gigabytes, respectively. If n is 0, PROC MEANS use the value of the SAS system option SUMSIZE=.
MAXIMUM|MAX: specifies the maximum amount of memory that is available.

Default:	The value of the SUMSIZE= system option.
Tip:	For best results, do not make SUMSIZE= larger than the amount of physical memory that is available for the PROC step. If additional space is needed, PROC MEANS uses utility files.
See also:	The SAS system option SUMSIZE= in SAS Language Reference: Dictionary.
Main discussion:	Computational Resources

VARDEF=divisor

specifies the divisor to use in the calculation of the variance and standard deviation. Possible Values for VARDEF= shows the possible values for divisor and associated divisors.

Possible Values for VARDEF=
Value Divisor Formula for Divisor

DF degrees of freedom n - 1

N number of observations n

WDF sum of weights minus one ( [Sigma] _iw_i) - 1

WEIGHT|WGT sum of weights [Sigma] _iw_i

The procedure computes the variance as [IMAGE] , where [IMAGE] is the corrected sums of squares and equals [IMAGE] . When you weight the analysis variables, [IMAGE] equals [IMAGE] , where [IMAGE] is the weighted mean.

Default: DF
Requirement: To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=.
Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of [IMAGE] , where the variance of the ith observation is [IMAGE] and [IMAGE] is the weight for the ith observation. This yields an estimate of the variance of an observation with unit weight.
Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of [IMAGE] , where [IMAGE] is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight.
See also: the example of weighted statistics
Main discussion: Keywords and Formulas

***Possible Values for VARDEF=***
Value	Divisor	Formula for Divisor
DF	degrees of freedom	n - 1
N	number of observations	n
WDF	sum of weights minus one	(_iw_i) - 1
WEIGHT\|WGT	sum of weights	_iw_i

Chapter Contents
Previous
Next
Top of Page

Descriptive statistic keywords
CLM	RANGE
CSS	SKEWNESS\|SKEW
CV	STDDEV\|STD
KURTOSIS\|KURT	STDERR
LCLM	SUM
MAX	SUMWGT
MEAN	UCLM
MIN	USS
N	VAR
NMISS
Quantile statistic keywords
MEDIAN\|P50	Q3\|P75
P1	P90
P5	P95
P10	P99
Q1\|P25	QRANGE
Hypothesis testing keyword
PROBT	T

Default:	N, MEAN, STD, MIN, and MAX
Requirement:	To compute standard error, confidence limits for the mean, and the Student's t test you must use the default value of VARDEF= which is DF. To compute skewness or kurtosis you must use VARDEF=N or VARDEF=DF.
Tip:	Use CLM or both LCLM and UCLM to compute a two-sided confidence limit for the mean. Use only LCLM or UCLM, to compute a one-sided confidence limit.
Main discussion:	The definitions of the keywords and the formulas for the associated statistics are listed in Keywords and Formulas .
Featured in:	Computing Specific Descriptive Statistics and Using the BY Statement with Class Variables

Default:	DF
Requirement:	To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=.
Tip:	When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of , where the variance of the ith observation is and is the weight for the ith observation. This yields an estimate of the variance of an observation with unit weight.
Tip:	When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of , where is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight.
See also:	the example of weighted statistics
Main discussion:	Keywords and Formulas