Chapter Contents |
Previous |
Next |

The MEANS Procedure |

Missing Values |

- If a class variable has a missing value for an
observation, PROC MEANS excludes that observation from the analysis unless
you use the MISSING option in the PROC statement or the CLASS statement.
- If a BY or an ID variable value is missing, PROC
MEANS treats it like any other BY or ID variable value. The missing values
form a separate BY group.
- If a FREQ variable value is missing or nonpositive,
PROC MEANS excludes the observation from the analysis.
- If a WEIGHT variable value is missing, PROC MEANS
excludes the observation from the analysis.

PROC MEANS tabulates the number of the missing values. Before the number of missing values are tabulated, PROC MEANS excludes observations with frequencies that are nonpositive when you use the FREQ statement and observations with weights that are missing or nonpositive (when you use the EXCLNPWGT option) when you use the WEIGHT statement. To report this information in the procedure output use the NMISS statistical keyword in the PROC statement.

Column Width for the Output |

The N Obs Statistic |

In the output data set, the value of N Obs is stored in the _FREQ_ variable. Use the NONOBS option in the PROC statement to suppress this information in the displayed output.

Output Data Set |

**Note:** By default the statistics in the output data set automatically
inherit the analysis variable's format and label. However, statistics computed
for N, NMISS, SUMWGT, USS, CSS, VAR, CV, T, PROBT, SKEWNESS, and KURTOSIS
do not inherit the analysis variable's format because this format may be invalid
for these statistics. Use the NOINHERIT option in the OUTPUT statement to
prevent the other statistics from inheriting the format and label attributes.

The output data set can contain these variables:

- the variables specified in the BY statement.
- the variables specified in the ID
statement.
- the variables specified in the CLASS statement.
- the variable _TYPE_ that contains information
about the class variables. By default _TYPE_ is a numeric variable. If you
specify CHARTYPE in the PROC statement, _TYPE_ is a character variable. When
you use more than 32 class variables, _TYPE_ is automatically a character
variable.
- the variable _FREQ_ that contains the number of
observations that a given output level represents.
- the variables requested in the OUTPUT statement
that contain the output statistics and extreme values.
- the variable _STAT_ that contains the names of
the default statistics if you omit statistic keywords.
- the variable _LEVEL_ if you specify the LEVEL
option.
- the variable _WAY_ if you specify the WAYS option.

The value of _TYPE_ indicates which combination of the class variables PROC MEANS uses to compute the statistics. The character value of _TYPE_ is a series of zeros and ones, where each value of one indicates an active class variable in the type. For example, with three class variables, PROC MEANS represents type 1 as 001, type 5 as 101, and so on.

Usually, the output data set contains one observation per level per type. However, if you omit statistical keywords in the OUTPUT statement, the output data set contains five observations per level (six if you specify a WEIGHT variable). Therefore, the total number of observations in the output data set is equal to the sum of the levels for all the types you request multiplied by 1, 5, or 6, whichever is applicable.

If you omit the CLASS statement (_TYPE_ = 0), there is always exactly one level of output per BY-group. If you use a CLASS statement, then the number of levels for each type you request has an upper bound equal to the number of observations in the input data set. By default, PROC MEANS generates all possible types. In this case the total number of levels for each BY-group has an upper bound equal to

where is the number of class variables and is the number of observations for the given BY group in the input data set and is 1, 5, or 6.

PROC MEANS determines the actual number of levels for a given type from the number of unique combinations of each active class variable. A single level is composed of all input observations whose formatted class values match.

The Effect of Class Variables on the OUTPUT Data Set shows the values of _TYPE_ and the number of observations in the data set when you specify one, two, and three class variables.

*The Effect of Class Variables on the OUTPUT Data Set*

Chapter Contents |
Previous |
Next |
Top of Page |

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.