Chapter Contents

Previous

Next
The MEANS Procedure

Concepts


Using Class Variables
The TYPES statement controls which of the available class variables PROC MEANS uses to subgroup the data. The unique combinations of these active class variable values that occur together in any single observation of the input data set determine the data subgroups. Each subgroup that PROC MEANS generates for a given type is called a level of that type. Note, for all types the inactive class variables can still affect the total observation count of the rejection of observations with missing values.

When you use a WAYS statement, PROC MEANS generates types that correspond to every possible unique combination of n class variables chosen from the complete set of class variables. For example

proc means;
 class a b c d e;
 ways 2 3;
 run;
is equivalent to
proc means;
 class a b c d e;
 types a*b a*c a*d a*e b*c b*d b*e c*d c*e d*e
       a*b*c a*b*d a*b*e a*c*d a*c*e a*d*e 
       b*c*d b*c*e c*d*e;
 run;
If you omit the TYPES statement and the WAYS statement, PROC MEANS uses all class variables to subgroup the data (the NWAY type) for displayed output and computes all types ( [IMAGE]) for the output data set.

Ordering the Class Values

PROC MEANS determines the order of each class variable in any type by examining the order of that class variable in the corresponding one-way type. You see the effect of this behavior in the options ORDER=DATA or ORDER=FREQ. When PROC MEANS subdivides the input data set into subsets, the classification process does not apply the options ORDER=DATA or ORDER=FREQ independently for each subgroup. Instead, one frequency and data order is established for all output based on an nonsubdivided view of the entire data set. For example, consider the following statements:
data pets;
 input Pet $ Gender $;
 datalines;
dog  m
dog  f
dog  f
dog  f
cat  m
cat  m
cat  f
;

proc means data=pets order=freq;
   class pet gender;
run;
The statements produce this output.  [Listing Output]In the example, PROC MEANS does not list male cats before female cats. Instead, it determines the order of gender for all types over the entire data set. PROC MEANS found more observations for female pets (f=4, m=3).


Computational Resources
PROC MEANS employs the same memory allocation scheme across all host environments. When class variables are involved, PROC MEANS must keep a copy of each unique value of each class variable in memory. You estimate the memory requirements to group the class variable by calculating

[IMAGE]

where
[IMAGE] is the number of unique values for the class variable
[IMAGE] is the combined unformatted and formatted length of [IMAGE]
[IMAGE] is some constant on the order of 32 bytes (64 for 64-bit architectures).
When you use the GROUPINTERNAL option in the CLASS statement, [IMAGE] is simply the unformatted length of [IMAGE].

Each unique combination of class variables, [IMAGE], for a given type forms a level in that type (see TYPES Statement ). You can estimate the maximum potential space requirements for all levels of a given type, when all combinations actually exist in the data (a complete type), by calculating

[IMAGE]

where
[IMAGE] is a constant based on the number of variables analyzed and the number of statistics calculated (unless you request QMETHOD=OS to compute the quantiles).
[IMAGE] are the number of unique levels for the active class variables of the given type.
Clearly, the memory requirements of the levels overwhelm those of the class variables. For this reason, PROC MEANS may open one or more utility files and write the levels of one or more types to disk. These types are either the primary types that PROC MEANS built during the input data scan or the derived types.

If PROC MEANS must write partially complete primary types to disk while it processes input data, then one or more merge passes may be required to combine type levels in memory with those on disk. In addition, if you use an order other than DATA for any class variable, PROC MEANS groups the completed type on disk. For this reason, the peak disk space requirements can be more than twice the memory requirements for a given type.

When PROC MEANS uses a temporary work file, you will receive the following note in the SAS log:

Processing on disk occurred during summarization.
Peak disk usage was approximately nnn Mbytes.
Adjusting SUMSIZE may improve performance.
In most cases processing ends normally.

When you specify class variables in a CLASS statement, the amount of data-dependent memory that PROC MEANS uses before it writes to a utility file is controlled by the SAS system option and PROC option SUMSIZE=. Like the system option SORTSIZE=, SUMSIZE= sets the memory threshold where disk-based operations begin. For best results, set SUMSIZE= to less than the amount of real memory that is likely to be available for the task. For efficiency reasons, PROC MEANS may internally round up the value of SUMSIZE=. SUMSIZE= has no effect unless you specify class variables.

If PROC MEANS reports that there is insufficient memory, increase SUMSIZE=. A SUMSIZE= value greater than MEMSIZE= will have no effect. Therefore, you may also need to increase MEMSIZE=. If PROC MEANS reports insufficient disk space, increase the WORK space allocation. See the SAS documentation for your operating environment for more information on how to adjust your computation resource parameters.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.