The SORT Procedure

PROC SORT Statement

PROC SORT <option(s)> <collating-sequence-option>;

To do this Use this option

Specify the input data set DATA=

Create an output data set OUT=

Specify the collating sequence

Specify ASCII ASCII

Specify EBCDIC EBCDIC

Specify Danish DANISH

Specify Finnish FINNISH

Specify Norwegian NORWEGIAN

Specify Swedish SWEDISH

Specify a customized sequence NATIONAL

Specify any of these collating sequences: ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, SPANISH, SWEDISH SORTSEQ=

Specify the output order

Reverse the order for character variables REVERSE

Maintain the order within BY groups EQUALS

Allow for variation within BY groups NOEQUALS

Eliminate duplicate observations

Delete observations with common BY values NODUPKEY

Delete observations that have duplicate values NODUPRECS

Specify the available memory SORTSIZE=

Force redundant sorting FORCE

Reduce temporary disk usage TAGSORT

To do this	Use this option
Specify the input data set	DATA=
Create an output data set	OUT=
Specify the collating sequence
	Specify ASCII	ASCII
	Specify EBCDIC	EBCDIC
	Specify Danish	DANISH
	Specify Finnish	FINNISH
	Specify Norwegian	NORWEGIAN
	Specify Swedish	SWEDISH
	Specify a customized sequence	NATIONAL
	Specify any of these collating sequences: ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, SPANISH, SWEDISH	SORTSEQ=
Specify the output order
	Reverse the order for character variables	REVERSE
	Maintain the order within BY groups	EQUALS
	Allow for variation within BY groups	NOEQUALS
Eliminate duplicate observations
	Delete observations with common BY values	NODUPKEY
	Delete observations that have duplicate values	NODUPRECS
Specify the available memory	SORTSIZE=
Force redundant sorting	FORCE
Reduce temporary disk usage	TAGSORT

Options

ASCII

sorts character variables using the ASCII collating sequence. You need this option only when you sort by ASCII on a system where EBCDIC is the native collating sequence.

Restriction:	You can specify only one collating sequence option in a PROC SORT step.
See also:	Sorting Orders for Character Variables

Default:

DANISH NORWEGIAN

sort characters according to the Danish and Norwegian national standard.

The Danish and Norwegian collating sequence is shown in National Collating Sequences of Alphanumeric Characters .

Operating Environment Information: For information about operating environment-specific behavior, see the SAS documentation for your operating environment. [cautionend]

Restriction: You can specify only one collating sequence option in a PROC SORT step.

DATA= SAS-data-set

identifies the input SAS data set.

Main discussion:

Input Data Sets

EBCDIC

sorts character variables using the EBCDIC collating sequence. You need this option only when you sort by EBCDIC on a system where ASCII is the native collating sequence.

Restriction:	You can specify only one collating sequence option in a PROC SORT step.
See also:	Sorting Orders for Character Variables

EQUALS | NOEQUALS

specifies the order of the observations in the output data set. For observations with identical BY-variable values, EQUALS maintains the order from the input data set in the output data set. NOEQUALS does not necessarily preserve this order in the output data set.

Default:	EQUALS
Interaction:	When you use NODUPRECS to remove consecutive duplicate observations in the output data set, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed.
Tip:	Using NOEQUALS can save CPU time and memory.

FINNISH SWEDISH

sort characters according to the Finnish and Swedish national standard. The Finnish and Swedish collating sequence is shown in National Collating Sequences of Alphanumeric Characters .

FORCE

sorts and replaces an indexed or subsetted data set when the OUT= option is not specified. Without the FORCE option, PROC SORT does not sort and replace an indexed data set because sorting destroys user-created indexes for the data set. When you specify FORCE, PROC SORT sorts and replaces the data set and destroys all user-created indexes for the data set. Indexes that were created or required by integrity constraints are preserved.

Tip:	Since, by default, PROC SORT does not sort a data set according to how it is already sorted, you can use FORCE to override this behavior. This might be necessary if the SAS System cannot verify the sort specification in the data set option SORTEDBY=. For information about SORTEDBY=, see the section on SAS system options in SAS Language Reference: Dictionary.
Restriction:	You cannot use PROC SORT with the FORCE option and without the OUT= option on data sets that were created with the Version 5 compatibility engine or with a sequential engine such as a tape format engine.

NATIONAL

sorts character variables using an alternate collating sequence, as defined by your installation, to reflect a country's National Use Differences. To use this option, your site must have a customized national sort sequence defined. Check with the SAS Installation Representative at your site to determine if a customized national sort sequence is available.

Restriction:

You can specify only one collating sequence option in a PROC SORT step.

NODUPKEY

checks for and eliminates observations with duplicate BY values. If you specify this option, PROC SORT compares all BY values for each observation to those for the previous observation written to the output data set. If an exact match is found, the observation is not written to the output data set.

Operating Environment Information: If you use the VMS operating environment sort, the observation that is written to the output data set is not always the first observation of the BY group. [cautionend]

See also: NODUPRECS
Featured in: Displaying the First Observation of Each BY Group

NODUPRECS

checks for and eliminates duplicate observations. If you specify this option, PROC SORT compares all variable values for each observation to those for the previous observation that was written to the output data set. If an exact match is found, the observation is not written to the output data set.

Alias :	NODUP
Interaction:	When you are removing consecutive duplicate observations in the output data set with NODUPRECS, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed.
Interaction:	The action of NODUPRECS is directly related to the setting of the SORTDUP data set option. When SORTDUP= is set to LOGICAL, NODUPRECS removes only the duplicate variables that are present in the input data set after a DROP or KEEP operation. Setting SORTDUP=LOGICAL increases the number of duplicate records that are removed because it eliminates variables before record comparisons takes place. Also, setting SORTDUP=LOGICAL can improve performance because dropping variables before sorting reduces the amount of memory required to perform the sort. When SORTDUP= is set to PHYSICAL, NODUPRECS removes all duplicate variables in the data set, regardless if they have been kept or dropped. For more information about the data set option SORTDUP=, see SAS Language Reference: Dictionary.
Tip:	Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations may remain in the output data set. You can remove all duplicates with this option by sorting on all variables.
See also:	NODUPKEY

NOEQUALS

See EQUALS | NOEQUALS.

NORWEGIAN

See DANISH.

OUT=SAS-data-set

names the output data set. If SAS-data-set does not exist, PROC SORT creates it.

Default:	Without OUT=, PROC SORT overwrites the original data set.
Tip :	You can use data set options with OUT=.
Featured in:	Sorting by the Values of Multiple Variables

REVERSE

sorts character variables using a collating sequence that is reversed from the normal collating sequence.

Interaction:	Using REVERSE with the DESCENDING option in the BY statement restores the sequence to the normal order.
See also:	The DESCENDING option in the BY statement. The difference is that the DESCENDING option can be used with both character and numeric variables.

SORTSEQ= collating-sequence

specifies the collating sequence. The value of collating-sequence can be any one of the individual options in the PROC SORT statement that specify a collating sequence, or the value can be the name of a translation table, either a default translation table or one that you have created in the TRANTAB procedure. For an example of using PROC TRANTAB and PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting . The available translation tables are

	Danish
	Finnish
	Italian
	Norwegian
	Spanish
	Swedish

To see how the alphanumeric characters in each language will sort, refer to National Collating Sequences of Alphanumeric Characters .

Restriction: You can specify only one collating sequence, either by SORTSEQ= or by one of the individual options that are available in the PROC SORT statement.

National Collating Sequences of Alphanumeric Characters

[IMAGE]

SORTSIZE=memory-specification

specifies the maximum amount of memory that is available to PROC SORT. memory-specification is one of the following:

MAX: specifies that all available memory can be used.
n: specifies the amount of memory in bytes, where n is a real number.
nK: specifies the amount of memory in kilobytes, where n is a real number.
nM: specifies the amount of memory in megabytes, where n is a real number.
nG: specifies the amount of memory in gigabytes, where n is a real number.

Specifying the SORTSIZE= option in the PROC SORT statement temporarily overrides the SAS system option SORTSIZE=. For information about the system option, see the section on SAS system options in SAS Language Reference: Dictionary

Operating Environment Information: Some system sort utilities may treat this option differently. Refer to the SAS documentation for your operating environment. [cautionend]

Default: the value of the SAS system option SORTSIZE=
Tip: This option can help improve sort performance by restricting the virtual memory paging that the operating environment controls. If PROC SORT needs more memory, it uses a temporary utility file. As a general rule, the value of SORTSIZE should not exceed the amount of physical memory that will be available to the sorting process.

SWEDISH

See FINNISH.

TAGSORT

stores only the BY variables and the observation numbers in temporary files. The BY variables and the observation numbers are called tags. At the completion of the sorting process, PROC SORT uses the tags to retrieve records from the input data set in sorted order.

Tip:

When the total length of BY variables is small compared with the record length, TAGSORT reduces temporary disk usage considerably. However, processing time may be much higher.

Chapter Contents
Previous
Next
Top of Page

Default:	the value of the SAS system option SORTSIZE=
Tip:	This option can help improve sort performance by restricting the virtual memory paging that the operating environment controls. If PROC SORT needs more memory, it uses a temporary utility file. As a general rule, the value of SORTSIZE should not exceed the amount of physical memory that will be available to the sorting process.