Chapter Contents

Previous

Next
The SORT Procedure

PROC SORT Statement


PROC SORT <option(s)> <collating-sequence-option>;

To do this Use this option
Specify the input data set DATA=
Create an output data set OUT=
Specify the collating sequence

Specify ASCII ASCII

Specify EBCDIC EBCDIC

Specify Danish DANISH

Specify Finnish FINNISH

Specify Norwegian NORWEGIAN

Specify Swedish SWEDISH

Specify a customized sequence NATIONAL

Specify any of these collating sequences: ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, SPANISH, SWEDISH SORTSEQ=
Specify the output order

Reverse the order for character variables REVERSE

Maintain the order within BY groups EQUALS

Allow for variation within BY groups NOEQUALS
Eliminate duplicate observations

Delete observations with common BY values NODUPKEY

Delete observations that have duplicate values NODUPRECS
Specify the available memory SORTSIZE=
Force redundant sorting FORCE
Reduce temporary disk usage TAGSORT


Options

ASCII
sorts character variables using the ASCII collating sequence. You need this option only when you sort by ASCII on a system where EBCDIC is the native collating sequence.
Restriction: You can specify only one collating sequence option in a PROC SORT step.
See also: Sorting Orders for Character Variables
Default: NO

DANISH
NORWEGIAN
sort characters according to the Danish and Norwegian national standard.

The Danish and Norwegian collating sequence is shown in National Collating Sequences of Alphanumeric Characters .

Operating Environment Information:   For information about operating environment-specific behavior, see the SAS documentation for your operating environment.  [cautionend]
Restriction: You can specify only one collating sequence option in a PROC SORT step.

DATA= SAS-data-set
identifies the input SAS data set.
Main discussion: Input Data Sets

EBCDIC
sorts character variables using the EBCDIC collating sequence. You need this option only when you sort by EBCDIC on a system where ASCII is the native collating sequence.
Restriction: You can specify only one collating sequence option in a PROC SORT step.
See also: Sorting Orders for Character Variables

EQUALS | NOEQUALS
specifies the order of the observations in the output data set. For observations with identical BY-variable values, EQUALS maintains the order from the input data set in the output data set. NOEQUALS does not necessarily preserve this order in the output data set.
Default: EQUALS
Interaction: When you use NODUPRECS to remove consecutive duplicate observations in the output data set, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed.
Tip: Using NOEQUALS can save CPU time and memory.

FINNISH
SWEDISH
sort characters according to the Finnish and Swedish national standard. The Finnish and Swedish collating sequence is shown in National Collating Sequences of Alphanumeric Characters .

Operating Environment Information:   For information about operating environment-specific behavior, see the SAS documentation for your operating environment.  [cautionend]
Restriction: You can specify only one collating sequence option in a PROC SORT step.

FORCE
sorts and replaces an indexed or subsetted data set when the OUT= option is not specified. Without the FORCE option, PROC SORT does not sort and replace an indexed data set because sorting destroys user-created indexes for the data set. When you specify FORCE, PROC SORT sorts and replaces the data set and destroys all user-created indexes for the data set. Indexes that were created or required by integrity constraints are preserved.
Tip: Since, by default, PROC SORT does not sort a data set according to how it is already sorted, you can use FORCE to override this behavior. This might be necessary if the SAS System cannot verify the sort specification in the data set option SORTEDBY=. For information about SORTEDBY=, see the section on SAS system options in SAS Language Reference: Dictionary.
Restriction: You cannot use PROC SORT with the FORCE option and without the OUT= option on data sets that were created with the Version 5 compatibility engine or with a sequential engine such as a tape format engine.

NATIONAL
sorts character variables using an alternate collating sequence, as defined by your installation, to reflect a country's National Use Differences. To use this option, your site must have a customized national sort sequence defined. Check with the SAS Installation Representative at your site to determine if a customized national sort sequence is available.
Restriction: You can specify only one collating sequence option in a PROC SORT step.

NODUPKEY
checks for and eliminates observations with duplicate BY values. If you specify this option, PROC SORT compares all BY values for each observation to those for the previous observation written to the output data set. If an exact match is found, the observation is not written to the output data set.

Operating Environment Information:   If you use the VMS operating environment sort, the observation that is written to the output data set is not always the first observation of the BY group.  [cautionend]
See also: NODUPRECS
Featured in: Displaying the First Observation of Each BY Group

NODUPRECS
checks for and eliminates duplicate observations. If you specify this option, PROC SORT compares all variable values for each observation to those for the previous observation that was written to the output data set. If an exact match is found, the observation is not written to the output data set.
Alias : NODUP
Interaction: When you are removing consecutive duplicate observations in the output data set with NODUPRECS, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed.
Interaction: The action of NODUPRECS is directly related to the setting of the SORTDUP data set option. When SORTDUP= is set to LOGICAL, NODUPRECS removes only the duplicate variables that are present in the input data set after a DROP or KEEP operation. Setting SORTDUP=LOGICAL increases the number of duplicate records that are removed because it eliminates variables before record comparisons takes place. Also, setting SORTDUP=LOGICAL can improve performance because dropping variables before sorting reduces the amount of memory required to perform the sort. When SORTDUP= is set to PHYSICAL, NODUPRECS removes all duplicate variables in the data set, regardless if they have been kept or dropped. For more information about the data set option SORTDUP=, see SAS Language Reference: Dictionary.
Tip: Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations may remain in the output data set. You can remove all duplicates with this option by sorting on all variables.
See also: NODUPKEY

NOEQUALS
See EQUALS | NOEQUALS.

NORWEGIAN
See DANISH.

OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, PROC SORT creates it.
Default: Without OUT=, PROC SORT overwrites the original data set.
Tip : You can use data set options with OUT=.
Featured in: Sorting by the Values of Multiple Variables

REVERSE
sorts character variables using a collating sequence that is reversed from the normal collating sequence.
Interaction: Using REVERSE with the DESCENDING option in the BY statement restores the sequence to the normal order.
See also: The DESCENDING option in the BY statement. The difference is that the DESCENDING option can be used with both character and numeric variables.

SORTSEQ= collating-sequence
specifies the collating sequence. The value of collating-sequence can be any one of the individual options in the PROC SORT statement that specify a collating sequence, or the value can be the name of a translation table, either a default translation table or one that you have created in the TRANTAB procedure. For an example of using PROC TRANTAB and PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting . The available translation tables are
Danish
Finnish
Italian
Norwegian
Spanish
Swedish

To see how the alphanumeric characters in each language will sort, refer to National Collating Sequences of Alphanumeric Characters .
Restriction: You can specify only one collating sequence, either by SORTSEQ= or by one of the individual options that are available in the PROC SORT statement.

National Collating Sequences of Alphanumeric Characters

[IMAGE]

SORTSIZE=memory-specification
specifies the maximum amount of memory that is available to PROC SORT. memory-specification is one of the following:

MAX
specifies that all available memory can be used.

n
specifies the amount of memory in bytes, where n is a real number.

nK
specifies the amount of memory in kilobytes, where n is a real number.

nM
specifies the amount of memory in megabytes, where n is a real number.

nG
specifies the amount of memory in gigabytes, where n is a real number.

Specifying the SORTSIZE= option in the PROC SORT statement temporarily overrides the SAS system option SORTSIZE=. For information about the system option, see the section on SAS system options in SAS Language Reference: Dictionary

Operating Environment Information:   Some system sort utilities may treat this option differently. Refer to the SAS documentation for your operating environment.  [cautionend]
Default: the value of the SAS system option SORTSIZE=
Tip: This option can help improve sort performance by restricting the virtual memory paging that the operating environment controls. If PROC SORT needs more memory, it uses a temporary utility file. As a general rule, the value of SORTSIZE should not exceed the amount of physical memory that will be available to the sorting process.

SWEDISH
See FINNISH.

TAGSORT
stores only the BY variables and the observation numbers in temporary files. The BY variables and the observation numbers are called tags. At the completion of the sorting process, PROC SORT uses the tags to retrieve records from the input data set in sorted order.
Tip: When the total length of BY variables is small compared with the record length, TAGSORT reduces temporary disk usage considerably. However, processing time may be much higher.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.