Chapter Contents

Previous

Next
COMPRESS=

COMPRESS=



Compresses observations in an output SAS data set

Valid in: DATA step and PROC steps
Category: Data Set Control
Restriction: Use with output data sets only.


Syntax
Syntax Description
Details
Comparisons
See Also

Syntax

COMPRESS= YES | NO | CHAR | BINARY

Syntax Description

YES | CHAR
specifies that the observations in a newly created SAS output data set are compressed (variable-length records).
Tip: SAS uses RLE (Run Length Encoding) to compress observations. This compression algorithm is better for character data.

NO
specifies that the observations in a newly created SAS data set are uncompressed (fixed-length records).

BINARY
specifies that observations in a newly created SAS output data set are compressed.
Tip: SAS uses RDC (Ross Data Compression) for this setting. This method is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data (that is, numeric variables). Because the compression function operates on a single record at a time, the record length needs to be several hundred bytes or larger for effective compression.


Details

Specify COMPRESS= only for output data sets, that is, data sets named in the DATA statement of a DATA step or in the OUT= option of a SAS procedure. The record type becomes a permanent attribute of the data set. To uncompress observations, use a DATA step to copy the data set and use COMPRESS=NO for the new data set.

When COMPRESS=YES|CHAR, SAS compresses the size of the data set with run-length encoding. Run-length encoding compresses the data set by reducing repeated consecutive characters to two- or three-byte representations. When COMPRESS=BINARY, SAS compression combines run-length encoding and sliding-window compression to compress the data set.

Use SAS/Toolkit to specify your own compression method.

Note:   Compression of observations is not supported by all engines.  [cautionend]

In Version 8, data sets created with engines that were available in earlier versions of SAS, such as the TAPE and XPORT engines, are still accessed by those engines. Therefore, if compression was unavailable for those engines, it is also not available when you access those data sets in Version 8.

The advantages gained by using the COMPRESS= data set option include:

The disadvantages of using the COMPRESS= data set option include:

By default, new observations are appended to existing compressed data sets. If you want to track and reuse free space, use the REUSE= data set option when you create a compressed SAS data set. REUSE=YES tells SAS to write new observations to the space that is freed when you delete other observations.


Comparisons

The COMPRESS= data set option overrides the COMPRESS= system option.

PERFORMANCE NOTE: Using this option increases the CPU time for reading a data set because of the overhead of uncompressing the record. In addition, some engines do not support compression of observations. When using COMPRESS=YES and REUSE=YES option settings, observations cannot be addressed by observation number.

Note that REUSE=YES takes precedence over POINTOBS=YES. For example:

data test(compress=yes pointobs=yes reuse=yes);
results in a data set that has POINTOBS=NO. Because POINTOBS=YES is the default when you use compression, REUSE=YES causes POINTOBS= to change to NO.

See Also

Data Set Options:

REUSE=

System Options:

COMPRESS=
REUSE=


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.