Chapter Contents

Previous

Next
SAS/ACCESS Software for PC File Formats: Reference

Selecting and Combining Data

For many of your SAS programs, you may need to combine data from more than one view descriptor or to manipulate data that are accessed by a specific view descriptor. The following sections describe how you can select and combine data


Using the WHERE Statement

Suppose you have a view descriptor VLIB.ALLINV that lists invoices for all customers; VLIB.ALLINV is based on the INVOICE.DBF file. You can use a SET statement to create a SAS data file that contains information on customers who have not paid their bills and whose bills amount to at least $300,000.

proc access dbms=dbf;
   create adlib.invoice.access;      
   /* create access descriptor   */
   path="c:\sasdemo\invoice.dbf";
   assign=yes;
   rename invoicenum = invnum
          amtbilled  = amtbilld
          amountinus = amtinus;
   format paidon date9.
          billedon date9.
          invoicenum  5.0
          billedby    6.0
          amtbilled   15.2
          amountinus  15.2;
   list all;

   create vlib.allinv.view;           
   /* create vlib.allinv view  */
   select all;
run;

data notpaid(keep=invnum billedto amtinus billedon);  
    /* example  */
   set vlib.allinv;
   where paidon is missing and amtinus>=300000;
run;

In the DATA step's WHERE statement, be sure to use SAS variable names, not PC files column names. WORK.NOTPAID Data File Created Using a SAS WHERE Statement shows the result of the new temporary SAS data file WORK.NOTPAID.

proc print data=notpaid;
   format amtinus dollar20.2;
title 'High Bills--Not Paid';
run;

WORK.NOTPAID Data File Created Using a SAS WHERE Statement
                              High Bills--Not Paid

         OBS    INVNUM    BILLEDTO                 AMTINUS    BILLEDON

          1     11271     18543489          $11,063,836.00    05OCT1998
          2     12102     18543489          $11,063,836.00    17NOV1998
          3     11286     43459747          $11,063,836.00    10OCT1998
          4     12051     39045213           $2,256,870.00    02NOV1998
          5     12471     39045213           $2,256,870.00    27DEC1998
          6     12476     38763919           $2,256,870.00    24DEC1998

The first line of the DATA step uses the KEEP= data set option. This option works with view descriptors just as it works with other SAS data sets; it specifies that you want to include only the listed variables in the new SAS data file WORK.NOTPAID. However, you can still use the other view descriptor variables in other statements within the DATA step.

The SAS WHERE statement includes two conditions to be met. First, it selects only observations that have a missing value for the PAIDON variable. Second, the SAS WHERE statement requires that the amount in each bill be higher than a certain figure. You need to be familiar with the PC files data so that you can determine reasonable values for these expressions. For information on the SAS WHERE statement, refer to SAS Language: Reference.


Using the SQL Procedure

The SQL procedure implements the Structured Query Language in the SAS System. The SQL procedure follows the SQL convention of using the terms column and row for variable and observation This section provides examples of using the SQL procedure with PC files data.

Joining Data from Various Sources

The SQL procedure provides another way to select and combine data. For example, suppose you have three data sets: two view descriptors, VLIB.CUSPHON and VLIB.CUSORDR, which are based on the CUSTOMERS.WK3 and ORDERS.XLS files, respectively, and a SAS data file, DLIB.OUTOFSTK, which contains product names and numbers that are out of stock. You can use the SQL procedure to create a view that joins the data from these three sources and displays their output. The SAS WHERE or subsetting IF statements would not be appropriate in this case because you want to compare variables from several sources, rather than simply merging or concatenating the data.

The following SAS statements select and combine data from the view descriptors and the SAS data file to create a PROC SQL view, SLIB.BADORDR. SLIB.BADORDR retrieves customer and product information that the sales department uses to notify customers of unavailable products.

proc access dbms=wk3;
   create adlib.customr.access;      
   /* create access descriptor  */
   path="c:\sasdemo\customers.wk3";
   worksheet=v;
   range='cus_phone';
   getnames=yes;
   skiprows=2;
   scantype=5;
   mixed=yes;
   list all;

   create vlib.cusphon.view;           
   /* create vlib.cusphon view */
   select customer phone name;
   rename customer = custnum;
run;

proc access dbms=xls;
   create adlib.orders.access;         
   /* create access descriptor  */
   path="c:\sasdemo\orders.xls";
   worksheet='sheet1';
   range='a1..j39';
   getnames=yes;
   skiprows=2;
   scantype=5;
   mixed=yes;
   list all;

   create vlib.cusordr.view;           
   /* create vlib.cusordr view  */
   select ordernum stocknum shipto;
   rename ordernum ordnum;
   format ordernum 5.0
          stocknum 4.0;
run;

proc sql;                                 
   /* example              */
   create view slib.badordr as
      select distinct cusphon.custnum, 
             cusphon.name, cusphon.phone,
             cusordr.stocknum, 
              outofstk.fibernam 
             as product
         from vlib.cusphon, vlib.cusordr, 
              dlib.outofstk
         where cusordr.stocknum=
           outofstk.fibernum 
              and cusphon.custnum=
                cusordr.shipto;

The CREATE VIEW statement incorporates a WHERE clause as part of its SELECT clause. The DISTINCT keyword eliminates any duplicate rows of customer numbers that occur when companies order an unavailable product more than once.

It is recommended that you not include an ORDER BY clause in a CREATE VIEW statement. Doing so causes the output data to be sorted every time the PROC SQL view is submitted, which may have a negative impact on performance. It is more efficient to add an ORDER BY clause to a SELECT statement that displays your output data, as shown below.

options linesize=120;
title 'Data Described by SLIB.BADORDR';

select * from slib.badordr
   order by custnum, product;
quit;

This SELECT statement uses the PROC SQL view SLIB.BADORDR to display joined WK3 and XLS data and SAS data in ascending order by the CUSTNUM column and then by the PRODUCT (that is, FIBERNAM) column. The data are ordered by PRODUCT because one customer may have ordered more than one product. To select all the columns from the view, use an asterisk (*) in place of column names. When an asterisk is used, the columns are displayed in the order specified in the SLIB.BADORDR view. Data Described by the PROC SQL View SLIB.BADORDR shows the data described by the SLIB.BADORDR view.

Data Described by the PROC SQL View SLIB.BADORDR
                                            Data Described by SLIB.BADORDER

        CUSTOMER  NAME                                                          PHONE         STOCKNUM  PRODUCT
        --------------------------------------------------------------------------------------------------------
        15432147  GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS                616/582-3906      4789  dacron
        18543489  LONE STAR STATE RESEARCH SUPPLIERS                            512/478-0788      8934  gold
        29834248  BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY                  (0552)715311      3478  olefin
        31548901  NATIONAL COUNCIL FOR MATERIALS RESEARCH                       406/422-3413      8934  gold
        43459747  RESEARCH OUTFITTERS                                           03/734-5111       8934  gold

Although the query uses SAS variable names like CUSTNUM, you may notice that the output uses PC files column names like CUSTOMER. By default, PROC SQL displays SAS variable labels, which default to PC files column names. (You can use the NOLABEL option to change this default.)

Creating New Columns with the GROUP BY Clause

Instead of creating a new PROC SQL view, you may want to summarize your data and create new columns in a report. Although you cannot use the ACCESS procedure to create new columns, you can easily do this by using the SQL procedure with data that are described by a view descriptor.

This example uses the SQL procedure to retrieve and manipulate data from the view descriptor VLIB.ALLEMP, which is based on the EMPLOYEE.DBF file. When this query (as a SELECT statement is often called) is submitted, it calculates and displays the average salary for each department. The query enables you to manipulate your data and display the results without creating a SAS data set.

Because this example reports on employees' salaries, the view descriptor VLIB.ALLEMP is assigned a SAS System password (MONEY) using the DATASETS procedure. Because of the READ= level of protection, the password must be specified in the PROC SQL SELECT statement before you can see the DIF file data accessed by WORK.ALLEMP.

In the following example, the DISTINCT keyword in the SELECT statement removes duplicate rows. The AVG function in the SQL procedure is equivalent to the SAS MEAN function.

options linesize=80;

proc access dbms=dbf;
   /* create access descriptor */
   create adlib.employ.access;
   path="c:\sasdemo\employee.dbf";
   assign=yes;
   format empid     6.0
          salary    dollar12.2
          jobcode   5.0
          birthdate date9.
          hiredate  date9.;
   list all;
run;

/* create work.allemp view  */
proc access dbms=dbf
            accdesc=adlib.employ;
   create work.allemp.view;
   select all;
run;

/* assign a password */
proc datasets library=work  memtype=view;
   modify allemp (read=money);
run;

/* example */
   title 'Average Salary Per ACC Department';

proc sql;
   select distinct dept,
          avg(salary) label='Average Salary' 
           format=dollar12.2
      from work.allemp(pw=money)
      where dept like 'ACC%'
      group by dept;
quit;

The columns are displayed in the order specified in the SELECT clause of the query. Data Retrieved by an SQL Procedure Query shows the result of the query.

Data Retrieved by an SQL Procedure Query
                Average Salary Per ACC Department

                                   Average
                      DEPT          Salary
                      --------------------
                      ACC013    $54,591.33
                      ACC024    $55,370.55
                      ACC043    $75,000.34

To delete a password on an access descriptor or any SAS data set, put a slash after the password:

/* delete the password */
proc datasets library=work  memtype=view;
   modify allemp (read=money/);
run;

For more information about SAS System passwords, see SAS System Passwords for SAS/ACCESS Descriptors.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.