Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The DATASOURCE Procedure

Example 10.8: Annual COMPUSTAT Data Files

This example shows how to extract a subset of cross sections when the required cross sections are listed in an external file. In the case of a COMPUSTAT file, the required cross sections are a list of companies. For example, you may want to extract annual data for a list of companies whose industry classification codes (DNUM), CUSIP issuer codes (CNUM), and CUSIP issue number and check digits (CIC) are given in an external file, COMPLIST, as follows:

   2640   346377   104
   3714   017634   106
   5812   171583   107
   6025   446150   104
   8051   087851   101

When the required companies are listed in an external file, you can either use the SAS macro processor to construct your WHERE statement expression or restructure your data file and include it after the WHERE key word.

The following steps use the first approach to construct the WHERE statement expression in the macro variable WHEXPR:

   filename compfile 'host-specific-file-name' <host-options>;
   %macro whstmt( fileref );
      %global whexpr;
      data _null_;
         infile &fileref end=last;
         length cnum $ 6;
         input  dnum cnum cic;
         call symput( 'dnum'||left(_n_), left(dnum) );
         call symput( 'cnum'||left(_n_), cnum );
         call symput( 'cic' ||left(_n_), left(cic) );
         if last then call symput( 'n', left(_n_) );
      run;
      %do i = 1 %to &n;
         %let whexpr = &whexpr 
          (DNUM=&&dnum&i and CNUM="&&cnum&i" and CIC=&&cic&i);
         %if &i ^= &n %then %let whexpr = &whexpr or;
         %end;
      %mend whstmt;
   %whstmt( compfile );
   filename compustat 'host-specific-Compustat-file-name' <host-options>;
   proc datasource filetype=csaibm infile=compstat
                   outby=company  out=dataset;
      where &whexpr;
   run;

The same result can also be obtained by creating an external file, WHEXPR, from the COMPFILE and including it after the WHERE key word, as shown in the following statements:

   filename whexpr 'host-specific-WHEXPR-file-name' <host-options>;
   data _null_;
      infile compfile end=last; file   whexpr;
      length cnum $ 6;
      input  dnum cnum cic;
      put "( "  dnum=  "and CNUM='"  cnum $6.  "' and "  cic= ")"  @;
      if not last then put ' or'; else put ';' ;
   run;
   
   filename compstat 'host-specific-Compustat-file-name' <host-options>;
   proc datasource filetype=csaibm infile=compustat
                   outby=company  out=dataset;
      where %inc 'host-specific-WHEXPR-file-name';
   run;
   
   title1 'Information on Selected Companies';
   proc print data=company;
   run;

The Output 10.8.1 shows the OUTBY= data set created by the preceding statements. As you can see, the companies listed in the COMPLIST file are reported in this data set.

Output 10.8.1: Printout of the OUTBY= Data Set Listing Selected Companies

Information on Selected Companies

Obs DNUM CNUM CIC FILE ZLIST SMBL XREL STK STATE COUNTY FINC EIN byselect st_date end_date ntime nobs nseries nselect REC INAME CONAME DUP DNUM2 CNUM2 CIC2 REC2 FILE2
1 2640 346377 104 3 4 FOR 0 0 34 31 0 34-1046753 1 1968 1987 20 20 423 366 1 CONVRT,PAPRBRD PD,EX CONTAIN FORMICA CORP 0 2640 346377 104 2 3
2 3714 017634 106 1 4 ALN 0 0 36 103 0 38-0290950 1 1968 1987 20 20 423 366 1 MOTOR VEHICLE PART,ACCESSORY ALLEN GROUP 0 3714 017634 106 2 1
3 5812 171583 107 11 1 CHU 5812 0 48 29 0 74-1507270 1 1968 1987 20 20 423 366 1 EATING PLACES CHURCH'S FRIED CHICKEN INC 0 5812 171583 107 2 11
4 6025 446150 104 3 6 HBAN 0 0 39 49 0 31-0724920 1 1968 1987 20 20 423 366 1 NATL BANKS-FED RESERVE SYS HUNTINGTON BANCSHARES 0 6025 446150 104 2 3
5 8051 087851 101 11 1 BEV 8050 0 6 37 0 95-4100309 1 1968 1987 20 20 423 366 1 SKILLED NURSING CARE FAC BEVERLY ENTERPRISES 0 8051 087851 101 2 11


Note that annual COMPUSTAT data are available in either IBM 360/370 General format or the Universal Character format. The first example expects an IBM 360/370 General format file since the FILETYPE= is set to CSAIBM, while the second example uses a Universal Character format file (FILETYPE=CSAUC).

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.