SAS/ACCESS Software for Relational Databases: Reference

Using SAS/ACCESS LIBNAME and Data Set Options to Improve Performance

Many of the new SAS/ACCESS LIBNAME options improve performance when you access data on your DBMS. This section provides details on some of these options. See SAS/ACCESS LIBNAME Statement and SAS/ACCESS Data Set Options for additional information.

Note: Some of the options discussed in this section might not be available for your DBMS. See your DBMS chapter for more information. [cautionend]

Passing WHERE Clauses to the DBMS

The DBKEY= data set option should be used in contexts in which you want to join a large DBMS table and a relatively small SAS data set. If these conditions are not satisfied, the options will not improve performance and, in some cases, they might impact performance negatively.

Here is an example of how performance is improved by using this option:

data keyvalues;   
   deptno=30; output; 
   deptno=10; output; 
run;  

libname dblib oracle user=testuser password=testpass
  path='myorapath';  

proc sql; 
   select bigtab.deptno, bigtab.loc       
   from dblib.dept bigtab, 
        keyvalues smallds     
   where bigtab.deptno = smallds.deptno;
quit;

In this example, the SQL procedure retrieves all the rows in the large ORACLE table DEPT and applies the WHERE clause during SQL procedure processing in SAS. Processing can be both CPU-intensive and I/O-intensive, if MYTABLE is large. Use the DBKEY= option with the previous example and compare performance:

proc sql; 
   select bigtab.deptno, bigtab.loc       
   from  dblib.dept(dbkey=deptno) bigtab, 
         keyvalues smallds     
   where bigtab.deptno = smallds.deptno;
quit;

In this example, the DBKEY= option instructs the SQL procedure to pass the WHERE clause to the SAS/ACCESS engine in a form similar to where deptno=host-variable. The engine then passes this optimized query to the DBMS server. The host-variable is substituted, one at a time, with DEPTNO values from the observations in the SAS data file KEYVALUES. As a result, only rows that match the WHERE clause are retrieved from the DBMS. Without this option, PROC SQL retrieves all the rows from the DEPT table.

The SQL statement that is created by the SAS/ACCESS engine and passed to the DBMS is similar to the following;

select deptno, loc    
   from bigtab.deptno 
   where deptno=:hostvariable;

The host-variable takes the value of the DEPTNO variable from the SAS data file KEYVALUES. The number of SELECT statements issued is equal to the number of rows in the data file. Therefore, for improved performance, the SAS data file should contain relatively fewer rows than the DBMS table to which it is being joined.

The DBKEY= option can also be used in a SAS DATA step, with the KEY= option in the SET statement, to improve the performance of joins. Specify a value of KEY=DBKEY in this situation. The following DATA step creates a new data file by joining the data file KEYVALUES with the DBMS table MYTABLE. The variable DEPTNO is used with the DBKEY= option to cause a WHERE clause to be issued by the SAS/ACCESS engine.

data sasuser.new;    
   set sasuser.keyvalues;    
   set dblib.mytable(dbkey=deptno) key=dbkey;
run;

You can also use the DBINDEX= option instead of the DBKEY= option if you know that the DBMS table has one or more indexes that use the column(s) on which the join is being performed. Use DBINDEX=index-name if you know the name of the index, or use DBINDEX=YES if you do not know the name.

Sorting DBMS Data

Sorting DBMS data can be resource-intensive, whether it is done using the SORT procedure, a BY statement, or an ORDER BY clause in an SQL procedure's SELECT statement. You should sort data only when sorted data are needed for your program. The following list includes guidelines and information about sorting data:

Say you have a data set where the libref references a relational DBMS table, and you are using that data set in a DATA step or procedure. If you specify a BY statement in a DATA or PROC step, be sure that the BY variable is associated with an indexed DBMS column. You can specify the LIBNAME option, DBINDEX=YES to try to use indexes defined on DBMS columns.
If you reference this same data set in a SAS program and the program includes a BY statement for a variable that corresponds to a column in the DBMS table, the SAS/ACCESS LIBNAME engine automatically generates an ORDER BY clause for that variable. Thus, the ORDER BY clause causes the DBMS to sort the data before the SAS procedure or DATA step uses the data in a SAS program. If the DBMS table is very large, this sorting can adversely affect your performance. Use a BY variable that is based on an indexed DBMS column to help reduce this negative impact.
The outermost BY or ORDER BY clause overrides any embedded BY or ORDER BY clauses, including those specified in a WHERE clause. In the following example, the EXEC_EMPLOYEES data set includes a BY statement that sorts the data by the variable SENIORITY. However, when that data set is used in the following PROC SQL query, the data is ordered by the SALARY column and not by SENIORITY.
```
libname mydblib oracle user=sasha password=bee path="hrdb_023"
schema=hrdept;
data exec_employees;
   set mydblib.staff (keep lastname firstname idnum);
   by seniority;
   where salary >= 150000;
run;

proc sql;
select * from exec_employees
  order by salary;
```

Passing Joins to the DBMS

In versions prior to Version 7, an SQL query involving one or more DBMS tables (or view descriptors, in Version 6) was processed by SAS as if the DBMS tables were individual SAS data files. The SQL procedure fetched all the rows from each DBMS table and performed the join within the SAS. This algorithm performed poorly, especially if the tables were large and SAS and the DBMS communicated over a network.

A more efficient method is to let the DBMS perform the join and return only the results of the join to the client, in this case, SAS. In Version 7 and later, this is how joins are processed which provides a major performance enhancement for programs that perform joins across tables in a single DBMS.

For example, assume two large DBMS tables named TABLE1 and TABLE2 have a column named DEPTNO, and you want to retrieve the rows from an inner join of these tables where the DEPTNO value in TABLE1 is equal to the DEPTNO value in TABLE2.

proc sql;
   select tab1.deptno, dname from
      dblib.table1 tab1
      dblib.table2 tab2
      where tabl.deptno = tab2.deptno
         using libname dblib oracle user=scott
            password=tiger path='myserver';
quit;

In Version 8, this join between two tables within the same library (where the libref references a database) is detected by the SQL procedure and passed by the SAS/ACCESS engine directly to the DBMS. The DBMS processes the inner join between the two tables, and only the resulting rows are passed back to the SAS System. Both inner and outer joins between two or more DBMS tables are supported in this enhancement.

Chapter Contents
Previous
Next
Top of Page