Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The DATASOURCE Procedure

Example 10.9: CRSP Daily NYSE/AMEX Combined Stocks

This example reads all the data on a three-volume daily NYSE/AMEX combined character data set. Assume that the following filerefs are assigned to the calendar/indices file and security files comprising this database:

Fileref VOLSER File Type
calfileDXAA1calendar/indices file on volume 1
secfile1DXAA1security file on volume 1
secfile2DXAA2security file on volume 2
secfile3DXAA3security file on volume 3

The data set CALDATA is created by the following statements to contain the calendar/indices file:

   proc datasource filetype=crspdci infile=calfile out=caldata;
   run;

Here the FILETYPE=CRSPDCI indicates that you are reading a character format (indicated by a C in the 6th position) daily (indicated by a D in the 5th position) calendar/indices file (indicated by an I in the 7th position).

The annual data in security files can be obtained by the following statements:

   proc datasource filetype=crspdca
                   infile=( secfile1 secfile2 secfile3 )
                   out=annual;
   run;

Similarly, the data sets to contain the daily security data (the OUT= data set) and the event data (the OUTEVENT= data set) are obtained by the following statements:

   proc datasource filetype=crspdcs
                   infile=( calfile secfile1 secfile2 secfile3 )
                   out=periodic index outevent=events;
   run;

Note that the FILETYPE= has an S at the 7th position, since you are reading the security files. Also, the INFILE= option first expects the fileref of the calendar/indices file since the dating variable (CALDT) is contained in that file. Following the fileref of calendar/indices file, you give the list of security files in the order you want to read them.

The Output 10.9.1 is generated by the following statements:

   title1 'First 5 Observations in the Calendar/Indices File';
   proc print data=caldata( obs=5 );
   run;
   
   title1 'Last 5 Observations in the Calendar/Indices File';
   proc print data=caldata( firstobs=6659 ) noobs;
   run;
   
   title1 "Periodic Series for CUSIP='09523220'";
   title2 "DATE >= '22dec88'd";
   proc print data=periodic;
      where cusip='09523220' and date >= '22dec88'd;
   run;
   
   title1 "Events for CUSIP='09523220'";
   proc print data=events;
      where cusip='09523220';
   run;

Output 10.9.1: Partial Listing of the Output Data Sets

First 5 Observations in the Calendar/Indices File

Obs date VWRETD VWRETX EWRETD EWRETX TOTVAL TOTCNT USDVAL USDCNT SPINDX SPRTRN
1 02JUL1962 -99.0000 -99.0000 -99.0000 -99.0000 319043897 2036 0 0 55.86 -99.0000
2 03JUL1962 0.0113 0.0112 0.0131 0.0130 322929231 2040 319043897 2036 56.49 0.0113
3 05JUL1962 0.0060 0.0059 0.0069 0.0068 324750979 2031 322838977 2031 56.81 0.0057
4 06JUL1962 -0.0107 -0.0107 -0.0064 -0.0064 321302641 2031 324699079 2022 56.17 -0.0113
5 09JUL1962 0.0067 0.0067 0.0018 0.0018 323221296 2029 320935790 2019 56.55 0.0068


Last 5 Observations in the Calendar/Indices File

date VWRETD VWRETX EWRETD EWRETX TOTVAL TOTCNT USDVAL USDCNT SPINDX SPRTRN
23DEC1988 0.0042154 0.0028936 0.005104 0.003588 2367541510 2563 2360655540 2561 277.87 0.0036118
27DEC1988 -.0029128 -.0029624 -0.001453 -0.001585 2360680550 2565 2367496320 2562 276.83 -.0037429
28DEC1988 0.0015624 0.0015249 0.001575 0.001484 2364369540 2568 2360668370 2564 277.08 0.0009031
29DEC1988 0.0067816 0.0066433 0.005578 0.005469 2379932980 2565 2364169480 2563 279.40 0.0083724
30DEC1988 -.0027338 -.0029144 0.010736 0.010572 2362374030 2567 2379932980 2565 277.72 -.0060126


Periodic Series for CUSIP='09523220'
DATE >= '22dec88'd

Obs CUSIP PERMNO COMPNO ISSUNO HEXCD HSICCD DATE BIDLO ASKHI PRC VOL RET SXRET BXRET
3 09523220 75285 0 0 1 7361 22DEC1988 15.00 15.375 15.375 54300 0.016529 . .
4 09523220 75285 0 0 1 7361 23DEC1988 15.50 15.750 15.625 17700 0.016260 . .
5 09523220 75285 0 0 1 7361 27DEC1988 15.50 15.750 15.625 10600 0.000000 . .
6 09523220 75285 0 0 1 7361 28DEC1988 15.50 15.500 15.500 10600 -0.008000 . .
7 09523220 75285 0 0 1 7361 29DEC1988 15.25 15.500 15.375 7000 -0.008065 . .
8 09523220 75285 0 0 1 7361 30DEC1988 15.00 15.250 15.000 13700 -0.024390 . .


Events for CUSIP='09523220'

Obs CUSIP PERMNO COMPNO ISSUNO HEXCD HSICCD EVENT DATE NCUSIP TICKER COMNAM SHRCLS SHRCD EXCHCD SICCD DISTCD DIVAMT FACPR FACSHR DCLRDT RCRDDT PAYDT SHROUT SHRFLG DLSTCD NWPERM NEXTDT DLBID DLASK DLPRC DLVOL DLRET TRTSCD NMSIND MMCNT NSDINX
1 09523220 75285 0 0 1 7361 NAMES 03MAY1988 09523220 BAW BLUE ARROW PLC   3 1 7361 . . . . . . . . . . . . . . . . . . . . .
2 09523220 75285 0 0 1 7361 DIST 18JUL1988         . . . 1212 0.13376 0 0 13JUL88 22JUL88 26AUG88 . . . . . . . . . . . . . .
3 09523220 75285 0 0 1 7361 SHARES 03MAY1988         . . . . . . . . . . 72757 0 . . . . . . . . . . . .
4 09523220 75285 0 0 1 7361 SHARES 30SEP1988         . . . . . . . . . . 706842 0 . . . . . . . . . . . .
5 09523220 75285 0 0 1 7361 SHARES 30DEC1988         . . . . . . . . . . 706842 0 . . . . . . . . . . . .
6 09523220 75285 0 0 1 7361 DELIST 30DEC1988         . . . . . . . . . . . . 100 0 . . . 0 . A . . . .


This example illustrates the following points:

By default, the OUT= data set contains only the periodic data. However, you may also want to include the event-oriented data in the OUT= data set. This is accomplished by listing the event variables together with periodic variables in a KEEP statement. For example, if you want to extract the historical CUSIP (NCUSIP), number of shares outstanding (SHROUT), and dividend cash amount (DIVAMT) together with all the periodic series, use the following statements:

   proc datasource filetype=crspdcs
                   infile=( calfile secfile1 secfile2 secfile3 )
                   out=both outevent=events;
      where cusip='09523220';
      keep  bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
   run;
   
   proc datasource filetype=crspdcs
                   infile=( calfile secfile1 )
                   out=both outevent=events;
      where cusip='09523220';
      keep  bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
   run;
   
   proc datasource filetype=crspdcs
                   infile=( calfile secfile1 )
                   out=both2 outevent=events2;
      where cusip='09523220';
      keep  bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
      keepevent ncusip shrflg;
   run;
   
   title1 "Printout of the First 4 Observations";
   title2 "CUSIP = '09523220'";
   proc print data=both noobs;
      var  cusip date vol ncusip divamt shrout;
      where cusip='09523220' and date <= '08may88'd;
   run;
   
   title1 "Printout of the Observations centered Around 18jul88";
   title2 "CUSIP = '09523220'";
   proc print data=both noobs;
      var  cusip date vol ncusip divamt shrout;
      where cusip='09523220' and
            date between '14jul88'd and '20jul88'd;
   run;
   
   title1 "Printout of the Observations centered Around 30sep88";
   title2 "CUSIP = '09523220'";
   proc print data=both noobs;
      var  cusip date vol ncusip divamt shrout;
      where cusip='09523220' and
            date between '28sep88'd and '04oct88'd;
   run;

Output 10.9.2: Including Event Variables in the OUT= Data Set

Printout of the First 4 Observations
CUSIP = '09523220'

CUSIP DATE VOL NCUSIP DIVAMT SHROUT
09523220 03MAY1988 296100 09523220 . 72757
09523220 04MAY1988 139200 09523220 . 72757
09523220 05MAY1988 9000 09523220 . 72757
09523220 06MAY1988 7900 09523220 . 72757


Printout of the Observations centered Around 18jul88
CUSIP = '09523220'

CUSIP DATE VOL NCUSIP DIVAMT SHROUT
09523220 14JUL1988 62000 09523220 . 72757
09523220 15JUL1988 106800 09523220 . 72757
09523220 18JUL1988 32100 09523220 0.13376 72757
09523220 19JUL1988 8600 09523220 . 72757
09523220 20JUL1988 10700 09523220 . 72757


Printout of the Observations centered Around 30sep88
CUSIP = '09523220'

CUSIP DATE VOL NCUSIP DIVAMT SHROUT
09523220 28SEP1988 33000 09523220 . 72757
09523220 29SEP1988 55200 09523220 . 72757
09523220 30SEP1988 40700 09523220 . 706842
09523220 03OCT1988 13400 09523220 . 706842
09523220 04OCT1988 110600 09523220 . 706842


Events referring to distributions and delistings have entries only in observations whose dates match the event dates. For example, DIVAMT has a value for only 18JUL88, as shown in the second printout in Output 10.9.2. The NAME and SHARES events refer to a date of change, therefore their values are expanded such that there is a value for each observation. For example, the date of NAMES record is 03MAY88, therefore NCUSIP has the same value from that date on. The SHROUT on the other hand changes its value twice, once on 03MAY88, the other time on 30SEP88. The third listing shows how the value of SHROUT remains constant at 72757 from 03MAY88 to 30SEP88, at which date it changes to 706842.

The events occurring on days other than the trading dates are not output to the OUT= data set.

The KEEP statement in the preceding example has no effect on the event variables output to the OUTEVENT= data set. If you want to extract only a subset of event variables, you need to use the KEEPEVENT statement. For example, the following code outputs only NCUSIP and SHROUT to the OUTEVENT= data set for CUSIP='09523220':

   proc datasource filetype=crspdxc
                   infile=( calfile secfile1 secfile2 secfile3 )
                   outevent=subevts;
      where cusip='09523220';
      keepevent  ncusip shrout;
   run;
   
   proc datasource filetype=crspdxc
                   infile=( calfile secfile1)
                   outevent=subevts;
      where cusip='09523220';
      keepevent  ncusip shrout;
   run;
   
   title1 "NCUSIP and SHROUT for CUSIP='09523220'";
   proc print data=subevts noobs;
   run;

Output 10.9.3: Listing of the OUTEVENT= Data Set with a KEEPEVENT Statement

NCUSIP and SHROUT for CUSIP='09523220'

CUSIP PERMNO COMPNO ISSUNO HEXCD HSICCD EVENT DATE NCUSIP SHROUT
09523220 75285 0 0 1 7361 NAMES 03MAY1988 09523220 .
09523220 75285 0 0 1 7361 SHARES 03MAY1988   72757
09523220 75285 0 0 1 7361 SHARES 30SEP1988   706842
09523220 75285 0 0 1 7361 SHARES 30DEC1988   706842


The OUTEVENT= data set in Output 10.9.3 is missing observations for which the EVENT variable is DIST or DELIST, since these event groups do not contain any selected events.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.