Chapter Contents

Previous

Next
SORT

SORT



Sorts observations in a SAS data set by one or more variables, then stores the resulting sorted observations in a new SAS data set or replaces the original data set

Windows specifics: Sort utilities available; SORTSIZE= and TAGSORT statement options


Syntax
Details
SORTSIZE= Option
TAGSORT Option
Creating Your Own Collating Sequences
See Also

Syntax

PROC SORT <option(s)> <collating-sequence-option>;

Note:   This is a simplified version of the SORT procedure syntax. For the complete syntax and its explanation, see the SORT procedure in SAS Procedures Guide  [cautionend]

SORTSIZE=memory-specification
specifies the maximum amount of memory available to the SORT procedure. For further explanation of the SORTSIZE= option, see the following Details section.

TAGSORT
stores only the BY variables and the observation number in temporary files. For further explanation of the TAGSORT option, see the following Details section.


Details

The SORT procedure sorts observations in a SAS data set by one or more character or numeric variables, either replacing the original data set or creating a new, sorted data set. By default under Windows, the SORT procedure uses the ASCII collating sequence.

The SORT procedure uses the sort utility specified by the SORTPGM system option. Under Windows, although all three SORTPGM keywords (HOST, BEST, and SAS) are accepted for compatibility, the SAS sort is always used. You can use all the options available to the SAS sort utility, such as the SORTSEQ and NODUPKEY options. For a complete list of all options available, see the SORT procedure in SAS Procedures Guide.

SORTSIZE= Option

Under Windows, you can use the SORTSIZE= option in the PROC SORT statement to limit the amount of memory available to the SORT procedure. This option may reduce the amount of swapping the SAS System must do to sort the data set. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your SASWORK directory to store the data in. The SORT procedure's algorithm can swap data more efficiently than Windows can.

The syntax of the SORTSIZE= option is as follows:


Syntax

SORTSIZE=memory-specification

where memory-specification can be one of the following:
n specifies the amount of memory in bytes.
nK specifies the amount of memory in 1-kilobyte multiples.
nM specifies the amount of memory in 1-megabyte multiples.

The default SAS configuration file sets this option to 2MB using the SORTSIZE= system option. A value of 2MB is optimal for all memory configurations. If your machine has more than 12 MB of physical memory and you are sorting large data sets, setting the SORTSIZE= option to a value greater than 2M might improve performance.

You can override the default value of the SORTSIZE= system option by specifying a different SORTSIZE= value in the PROC SORT statement, or by submitting an OPTIONS statement that sets the SORTSIZE= system option to a new value.

The SORTSIZE= option is also discussed in Improving Performance of the SORT Procedure.

TAGSORT Option

The TAGSORT option in the PROC SORT statement is useful when there may not be enough disk space to sort a large SAS data set. When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. You should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that while using the TAGSORT option may reduce temporary disk use, the processing time may be much higher. However, on PCs with limited available disk space, the TAGSORT option may allow sorts to be performed in situations where they would otherwise not be possible.

Creating Your Own Collating Sequences

If you want to provide your own collating sequences or change a collating sequence that has been provided for you, use the TRANTAB procedure to create or modify translate tables. For complete details on the TRANTAB procedure, see SAS Procedures Guide. When you create your own translate tables, they are stored in your PROFILE catalog and they override any translate tables by the same name that are stored in the HOST catalog.

Note:   System managers can modify the HOST catalog by copying newly created tables from the SASUSER.PROFILE catalog to the HOST catalog. Then all users can access the new or modified translate table.  [cautionend]

If you want to see the names of the collating sequences stored in the HOST catalog (using the SAS Explorer), submit the following statement:

dm 'catalog sashelp.host' catalog;
Alternatively, you can select the View pull-down menu, then select the Libraries item, then double-click on the SASHELP library, and then double-click on the HOST catalog. In batch mode, you can use the following statements to generate a list of the contents of the HOST catalog:
proc catalog catalog=sashelp.host;
   contents;
run;
Entries of type TRANTAB are the collating sequences.

If you want to see the contents of a particular translate table, use the following statements:

proc trantab table=table-name;
   list;
run;
The contents of the collating sequence are displayed in the SAS log.

See Also


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.