Chapter Contents

Previous

Next
SAS Companion for the OS/2 Environment

Advanced Performance Tuning Methods

This section presents some advanced performance topics, such as improving the performance of the SORT procedure and calculating data set size. Use these methods only if you are an experienced SAS user, and you are familiar with the way that SAS is configured on your machine.


Improving Performance of the SORT Procedure

Consider using two options for the PROC SORT statement, the SORTSIZE= and TAGSORT options. These two options control the amount of memory that the SORT procedure uses during a sort and are discussed in the next two sections. For more information about the SORT procedure, see SORT.

SORTSIZE= Option

The SORTSIZE= option specifies the amount of memory that is available for PROC SORT to use and can reduce the amount of swapping that the SAS System must do to sort the data set. A value of 2 M is optimal for all memory configurations. If your machine has more than 12 MB of physical memory and you are sorting large data sets, setting this option to a value between 2 MB and 8 MB may improve performance. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your WORK folder to complete the sort.

If you do not use the SORTSIZE= option, PROC SORT uses the value of the SORTSIZE system option. The default value of the SORTSIZE system option is 2 M.

TAGSORT Option

The TAGSORT option is useful in situations where there may not be enough disk space to sort a large SAS data set. When you specify the TAGSORT option, only sort keys (that is, the variables that are specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. However, you should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that although using the TAGSORT option can reduce temporary disk use, the processing time may be much higher.

Determining Where the Sort Occurs

Your choice of how you reference the data set name and whether you use the OUT= option in the PROC SORT statement determines where the physical sort occurs. You may want to know where the sort occurs if you think that there may not be enough disk space available for the sort. You always need free disk space that equals three to four times the SAS data set size. For example, if your SAS data set requires 1 MB of disk space, you need 3 to 4 MB of disk space to complete the sort.

When you sort a SAS data set, the data set is sorted in a temporary utility file that has a file extension of .sasv7butl or .su7 (the shorter file extension is used in libraries that support only three characters in a file extension). If there is not enough memory to hold the utility file during the sort, the utility file is created in the WORK data library. If the sort completes successfully, the utility file is renamed with an extension of .sasv7bdat and the original data set is deleted. This ensures data integrity. Be sure that you have space for the utility file. Use the following rules to determine where the utility file and the resulting sorted data sets are created:


Calculating Data Set Size

To estimate the amount of disk space that is needed for a SAS data set:

  1. create a dummy SAS data set that contains one observation and the variables that you need

  2. run the CONTENTS procedure using the dummy data set

  3. determine the data set size by performing simple math using information from the CONTENTS procedure output.

For example, for a data set with one character variable and four numeric variables, you would submit the following statements:

data oranges;
 input variety $ flavor texture looks;
 total=flavor+texture+looks;
 datalines;
navel 9 8 6
;
proc contents data=oranges;
 title 'Example for Calculating Data Set Size';
run;

These statements generate the output shown in Example for Calculating Data Set Size with PROC CONTENTS.

Example for Calculating Data Set Size with PROC CONTENTS
                    Example for Calculating Data Set Size                             1
                                                           
                          The CONTENTS Procedure
                  
Data Set Name: WORK.ORANGES                          Observations:         1
Member Type:   DATA                                  Variables:            5
Engine:        V8                                    Indexes:              0
Created:       x:xx Tuesday February 2, xxxx         Observation Length:   40
Last Modified: x:xx Tuesday February 2, xxxx         Deleted Observations: 0
Protection:                                          Compressed:           NO
Data Set Type:                                       Sorted:               NO
Label:

                -----Engine/Host Dependent Information-----

      Data Set Page Size:         4096
      Number of Data Set Pages:   1
      First Data Page:            1
      Max Obs per Page:           101
      Obs in First Data Page:     1
      Number of Data Set Repairs: 0
      File Name:                  E:\SASV8\SASWORK\_TD301\oranges.sas7bdat
      Release Created:            8.00.00B
      Host Created:               OS2


            -----Alphabetic List of Variables and Attributes-----

                     #    Variable    Type    Len    Pos
                     ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
                     2    flavor      Num       8      0
                     4    looks       Num       8     16 
                     3    texture     Num       8      8
                     5    total       Num       8     24
                     1    variety     Char      8     32

The size of the resulting data set depends on the data set page size and the number of observations. You can use your PROC CONTENTS output and the following formula to estimate the data set size:
number of data pages = 1 + (floor( Observations / Max Obs per Page))
size = 256 + ( Data Set Page Size * number of data pages)
(floor represents a function that rounds the value down to the nearest integer.)

Taking the information that is shown in Example for Calculating Data Set Size with PROC CONTENTS, you can calculate the size of the example data set:
number of data pages = 1 + (floor(1/101))
size = 256 + (4096 * 1) = 4352

Thus, the example data set uses 4,352 bytes of storage space.


Increasing the Efficiency of Interactive Processing

If you are running a SAS job by using the SAS System interactively, and the job generates numerous log messages or extensive output, consider using the AUTOSCROLL command to suppress the scrolling of windows. This makes your job run faster because the SAS System does not have to use resources to update the display of the Log and Output windows during the job. For example, issuing the command autoscroll 0 in the Log window causes the Log window not to scroll until your job is finished. (For the Output window, AUTOSCROLL is set to 0 by default.)


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.