SAS/ACCESS Software for Relational Databases: Reference

Performance and Efficient View Descriptors

When you create, update, and use view descriptors, follow these guidelines to minimize the use of SAS System resources and to reduce the time it takes the DBMS to access data.

General Information

When you create or update view descriptors, select only the columns that your program needs. Selecting unnecessary columns adds extra processing time due to data conversions.

When you use a view descriptor in a DATA step or SAS procedure, columns that you specified in the VAR and KEEP statements are passed to the DBMS for processing. Columns that you specified in DROP statements are not passed to the DBMS. Therefore, only a subset of the columns is returned to the SAS System, and performance is usually enhanced. (This approach can be applied in some of DBMSs, such as CA-OpenIngres, ORACLE, Oracle Rdb, and SYBASE.)

Sorting DBMS Data

Sorting DBMS data can be resource-intensive, whether it is done using the SORT procedure, a BY statement, or an ORDER BY clause in a view descriptor or SQL procedure. You should sort data only when sorted data are needed for your program. The following list includes guidelines and information about sorting data:

If you specify a BY statement in a DATA or PROC step that references a view descriptor, be sure that the BY variable is associated with an indexed DBMS column.
If you reference a view descriptor in a SAS program and the program includes a BY statement for a variable that corresponds to a column in the DBMS table, the SAS/ACCESS interface view engine automatically generates an ORDER BY clause for that variable. Thus, the ORDER BY clause causes the DBMS to sort the data before the SAS procedure or DATA step uses the data in a SAS program. If the DBMS table is very large, this sorting can adversely affect your performance. Use a BY variable that is based on an indexed DBMS column to help reduce this negative impact.
The outermost BY or ORDER BY clause overrides any embedded BY or ORDER BY clauses, including those specified in the selection criteria.
For example, if the view descriptor has an ORDER BY clause and you have specified a BY statement in your SAS program, the BY statement overrides the view descriptor's ORDER BY clause. If you use an ORDER BY clause in an SQL procedure statement that references a view descriptor, this ORDER BY clause also overrides the view descriptor's ORDER BY clause.

Extracting Data Using a View

In some cases, it might be more efficient to use a view descriptor to extract (that is, copy) DBMS data and place it in a SAS data file instead of using the view descriptor to read the data directly.

A DBMS table is read every time a view descriptor is referred to in a SAS program and the program is executed; the program's output reflects the latest updated level of the DBMS table. If many users are reading the same DBMS table repeatedly, DBMS performance may decrease. If you create several reports during the same SAS session, they may not be based on the same DBMS data due to updating by other users. Therefore, in the following circumstances, it is better to extract data:

Extract DBMS data if the table is large and you use the data repeatedly in SAS programs.
If a view descriptor describes a large DBMS table and you plan to use the same DBMS data in several procedures or DATA steps during the same SAS session, you might improve performance by extracting the data. Placing the data into a SAS data file requires a certain amount of disk space to store the data and I/O to write the data. However, SAS data files are organized to provide optimal performance with PROC and DATA steps. Programs that use SAS data files are often more efficient than programs that read DBMS data directly.
(Exception: If you are using a SAS WHERE statement to create small subsets of data and you are reading the subsets based on indexed columns, it may be better not to extract the data from a large DBMS table. In this case, the DBMS can probably retrieve subsets of data based on indexed columns faster.)
Extract DBMS data if you use sorted data several times in a SAS program.
If you intend to use DBMS data in a particular sorted order several times, run the SORT procedure on the view descriptor by using the OUT= option to extract the data. This OUT= option is required whenever PROC SORT references a view descriptor. PROC SORT sends an ORDER BY clause to the DBMS so that the data is returned in sorted order to the SAS data file. PROC SORT does not perform any additional sorting of the data. Extracting the data in this way is more efficient than requesting the same sort repeatedly (with an ORDER BY clause) on the DBMS data.
Extract DBMS data for added security.
If you are the owner of a DBMS table and do not want anyone else to read the data, you might want to extract the data (or a subset of the data) and not distribute information about either the access descriptor or view descriptor. Or, you might want to assign DBMS security features to your DBMS tables to prevent unauthorized reading or writing to them.
On the SAS System side, you might also want to assign SAS System passwords to your descriptors for additional security. If a view descriptor has a password assigned to it and you extract the data, the new SAS data file is automatically assigned the same password. If a view descriptor does not have a password, you can assign a password to the extracted SAS data file. You can assign a password by using PROC DATASETS on your descriptor. See the SAS Procedures Guide for more information.

Using a Subset of the DBMS Data

When you create or update view descriptors, you should specify selection criteria (where possible) to subset the number of rows that the DBMS returns to the SAS System.

As a general rule, a view descriptor's WHERE clause is passed to the DBMS for processing. How WHERE Clauses Are Processed with View Descriptors and the explanation afterwards describe this process in more detail.

How WHERE Clauses Are Processed with View Descriptors
View descriptor with a WHERE clause System that processes the WHERE clause

used in a DATA step, PROC step, or PROC SQL query

without additional WHERE clause DBMS

with additional WHERE clause DBMS: compound WHERE clause built with AND; otherwise, SAS System (table note 1)

used with a PROC SQL Pass-Through query (table note 2)

without additional WHERE clause DBMS

with additional WHERE clause SAS System. Each DBMS table in the join is processed by the DBMS and then returned to the SAS System for final processing. PROC SQL cannot optimize a join of this kind.

***How WHERE Clauses Are Processed with View Descriptors***
View descriptor with a WHERE clause	System that processes the WHERE clause
used in a DATA step, PROC step, or PROC SQL query
without additional WHERE clause	DBMS
with additional WHERE clause	DBMS: compound WHERE clause built with AND; otherwise, SAS System (table note 1)
used with a PROC SQL Pass-Through query (table note 2)
without additional WHERE clause	DBMS
with additional WHERE clause	SAS System. Each DBMS table in the join is processed by the DBMS and then returned to the SAS System for final processing. PROC SQL cannot optimize a join of this kind.

TABLE NOTE 1:

The interface view engine builds a compound WHERE clause using AND operator(s) if the clauses are valid for the DBMS. Otherwise, only the valid part of the WHERE clause is sent to the DBMS for processing and the SAS System processes the remaining part(s). See "Using Multiple WHERE Clauses" below for more information.

TABLE NOTE 2:

When a view descriptor and a Pass-Through query are used within the same PROC SQL query, they are usually joined in the FROM clause.

Note: See SQL Procedure's Interaction with SAS/ACCESS Software for more information on Pass-Through queries. [cautionend]

Using Multiple WHERE Clauses

When a view descriptor that includes a DBMS-specific SQL WHERE clause is used in a DATA step or procedure that includes a SAS WHERE statement, the SAS/ACCESS interface view engine tries to pass both WHERE conditions to the DBMS for processing. If all or part of the WHERE clause is valid for the DBMS, SAS/ACCESS software uses one or more logical AND operators to build a compound WHERE clause.

However, if a WHERE clause contains SAS enhancements or features that are not supported by the DBMS, the WHERE conditions are split up, and only the valid condition (that is, valid DBMS-specific syntax) is sent to the DBMS. The SAS System processes the remaining part(s) of the WHERE clause.

For example, a DBMS-specific SQL cannot parse a colon modifier on a comparison operator, and therefore, the first half of the following WHERE clause cannot be passed to the DBMS.

The second half of the WHERE clause is passed to the DBMS, and the DBMS returns to the SAS System all the rows in the INVOICE table for which PAIDON is on or after 01JAN94. The SAS System must then process the first half of the WHERE clause to subset the rows that were returned from the DBMS for the countries Argentina and Australia:

where country =: 'A' and paidon >= '01JAN94'd;

It is more efficient to use a LIKE operator that is valid in both the DBMS-specific SQL and the SAS System:

where country like 'A%' and paidon >= '01JAN94'd;

In this case, the DBMS can process both halves of the WHERE clause.

Note that the SAS/ACCESS interface view engine can translate certain SAS conventions, such as datetime formats or the IS MISSING operator, to their DBMS-specific SQL equivalents and pass these clauses to the DBMS.

In most cases it is more efficient for the DBMS to process the WHERE conditions. Therefore, write WHERE conditions using features (such as SQL operators) that are valid in both the SAS System and the DBMS so that the DBMS can process the entire WHERE clause.

It is also more efficient to use a SAS WHERE statement instead of a subsetting IF statement. As just described, a WHERE clause is passed to the DBMS for processing and returns a subset of rows to the SAS System for further processing. In contrast, when you use a subsetting IF statement, every row is returned to the SAS System to be evaluated by the IF statement. Therefore, using a SAS WHERE statement often improves performance.

Writing Efficient WHERE Clauses for View Descriptors

You should write WHERE clauses that enable the DBMS to use its indexes, where possible. This is a good practice whether you specify the WHERE clause in the view descriptor's selection criteria or you use a SAS WHERE statement in a DATA step that references a view descriptor. This practice is especially important when you are accessing large DBMS tables.

You cannot tell the DBMS to use an index, but you can write WHERE clauses that enable it to use its DBMS indexes effectively. Here are some guidelines for writing WHERE clauses that enable the DBMS to use indexes effectively.

Note: The guidelines for each specific DBMS may vary. [cautionend]

Avoid the NOT operator when you can use an equivalent form:

Inefficient: where zipcode not>8000

Efficient: where zipcode<=8000
Avoid the >= and <= operators when you can use the BETWEEN predicate.

Inefficient: where ZIPCODE>=70000 and ZIPCODE<=80000

Efficient: where ZIPCODE between 70000 and 80000
Avoid using LIKE predicates that begin with % or _ .

Inefficient: where COUNTRY like '%INA'

Efficient: where COUNTRY like 'A%INA'
Avoid arithmetic expressions in a predicate.

Inefficient: where SALARY>12*4000.00

Efficient: where SALARY>48000.00

For a list of operators that are generally accepted by the SQLs of most DBMSs, see the SAS Procedures Guide.

Chapter Contents
Previous
Next
Top of Page

	Inefficient: `where zipcode not>8000`
	Efficient: `where zipcode<=8000`

	Inefficient: `where ZIPCODE>=70000 and ZIPCODE<=80000`
	Efficient: `where ZIPCODE between 70000 and 80000`

	Inefficient: `where COUNTRY like '%INA'`
	Efficient: `where COUNTRY like 'A%INA'`

	Inefficient: `where SALARY>12*4000.00`
	Efficient: `where SALARY>48000.00`