Chapter Contents

Previous

Next
SAS/ACCESS Software for Relational Databases: Reference

Using SAS/ACCESS LIBNAME and Data Set Options to Improve Performance

Many of the new SAS/ACCESS LIBNAME options improve performance when you access data on your DBMS. This section provides details on some of these options. See SAS/ACCESS LIBNAME Statement and SAS/ACCESS Data Set Options for additional information.

Note:   Some of the options discussed in this section might not be available for your DBMS. See your DBMS chapter for more information.  [cautionend]


Passing WHERE Clauses to the DBMS

The DBKEY= data set option should be used in contexts in which you want to join a large DBMS table and a relatively small SAS data set. If these conditions are not satisfied, the options will not improve performance and, in some cases, they might impact performance negatively.

Here is an example of how performance is improved by using this option:

data keyvalues;   
   deptno=30; output; 
   deptno=10; output; 
run;  

libname dblib oracle user=testuser password=testpass
  path='myorapath';  

proc sql; 
   select bigtab.deptno, bigtab.loc       
   from dblib.dept bigtab, 
        keyvalues smallds     
   where bigtab.deptno = smallds.deptno;
quit;

In this example, the SQL procedure retrieves all the rows in the large ORACLE table DEPT and applies the WHERE clause during SQL procedure processing in SAS. Processing can be both CPU-intensive and I/O-intensive, if MYTABLE is large. Use the DBKEY= option with the previous example and compare performance:

proc sql; 
   select bigtab.deptno, bigtab.loc       
   from  dblib.dept(dbkey=deptno) bigtab, 
         keyvalues smallds     
   where bigtab.deptno = smallds.deptno;
quit;

In this example, the DBKEY= option instructs the SQL procedure to pass the WHERE clause to the SAS/ACCESS engine in a form similar to where deptno=host-variable. The engine then passes this optimized query to the DBMS server. The host-variable is substituted, one at a time, with DEPTNO values from the observations in the SAS data file KEYVALUES. As a result, only rows that match the WHERE clause are retrieved from the DBMS. Without this option, PROC SQL retrieves all the rows from the DEPT table.

The SQL statement that is created by the SAS/ACCESS engine and passed to the DBMS is similar to the following;

select deptno, loc    
   from bigtab.deptno 
   where deptno=:hostvariable;
The host-variable takes the value of the DEPTNO variable from the SAS data file KEYVALUES. The number of SELECT statements issued is equal to the number of rows in the data file. Therefore, for improved performance, the SAS data file should contain relatively fewer rows than the DBMS table to which it is being joined.

The DBKEY= option can also be used in a SAS DATA step, with the KEY= option in the SET statement, to improve the performance of joins. Specify a value of KEY=DBKEY in this situation. The following DATA step creates a new data file by joining the data file KEYVALUES with the DBMS table MYTABLE. The variable DEPTNO is used with the DBKEY= option to cause a WHERE clause to be issued by the SAS/ACCESS engine.

data sasuser.new;    
   set sasuser.keyvalues;    
   set dblib.mytable(dbkey=deptno) key=dbkey;
run;

You can also use the DBINDEX= option instead of the DBKEY= option if you know that the DBMS table has one or more indexes that use the column(s) on which the join is being performed. Use DBINDEX=index-name if you know the name of the index, or use DBINDEX=YES if you do not know the name.


Sorting DBMS Data

Sorting DBMS data can be resource-intensive, whether it is done using the SORT procedure, a BY statement, or an ORDER BY clause in an SQL procedure's SELECT statement. You should sort data only when sorted data are needed for your program. The following list includes guidelines and information about sorting data:


Passing Joins to the DBMS

In versions prior to Version 7, an SQL query involving one or more DBMS tables (or view descriptors, in Version 6) was processed by SAS as if the DBMS tables were individual SAS data files. The SQL procedure fetched all the rows from each DBMS table and performed the join within the SAS. This algorithm performed poorly, especially if the tables were large and SAS and the DBMS communicated over a network.

A more efficient method is to let the DBMS perform the join and return only the results of the join to the client, in this case, SAS. In Version 7 and later, this is how joins are processed which provides a major performance enhancement for programs that perform joins across tables in a single DBMS.

For example, assume two large DBMS tables named TABLE1 and TABLE2 have a column named DEPTNO, and you want to retrieve the rows from an inner join of these tables where the DEPTNO value in TABLE1 is equal to the DEPTNO value in TABLE2.

proc sql;
   select tab1.deptno, dname from
      dblib.table1 tab1
      dblib.table2 tab2
      where tabl.deptno = tab2.deptno
         using libname dblib oracle user=scott
            password=tiger path='myserver';
quit;

In Version 8, this join between two tables within the same library (where the libref references a database) is detected by the SQL procedure and passed by the SAS/ACCESS engine directly to the DBMS. The DBMS processes the inner join between the two tables, and only the resulting rows are passed back to the SAS System. Both inner and outer joins between two or more DBMS tables are supported in this enhancement.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.