Example 10.9: CRSP Daily NYSE/AMEX Combined Stocks
This example reads all the data on a three-volume daily NYSE/AMEX combined character
data set. Assume that the following filerefs
are assigned to the calendar/indices file and security files comprising
this database:
Fileref
|
VOLSER
|
File Type
|
calfile | DXAA1 | calendar/indices file on volume 1 |
secfile1 | DXAA1 | security file on volume 1 |
secfile2 | DXAA2 | security file on volume 2 |
secfile3 | DXAA3 | security file on volume 3 |
The data set CALDATA is created by the following statements
to contain the calendar/indices file:
proc datasource filetype=crspdci infile=calfile out=caldata;
run;
Here the FILETYPE=CRSPDCI indicates that you are reading a character
format (indicated by a C in the 6th position) daily
(indicated by a D in the 5th position) calendar/indices file
(indicated by an I in the 7th position).
The annual data in security files can be obtained by the following statements:
proc datasource filetype=crspdca
infile=( secfile1 secfile2 secfile3 )
out=annual;
run;
Similarly, the data sets to contain the daily security data (the OUT=
data set) and the event data (the OUTEVENT= data set) are
obtained by the following statements:
proc datasource filetype=crspdcs
infile=( calfile secfile1 secfile2 secfile3 )
out=periodic index outevent=events;
run;
Note that the FILETYPE= has an S at the 7th position,
since you are reading the security files. Also, the INFILE=
option first expects the fileref of the calendar/indices file
since the dating variable (CALDT) is contained in that file. Following
the fileref of calendar/indices file, you give the list of security
files in the order you want to read them.
The Output 10.9.1 is generated by the following statements:
title1 'First 5 Observations in the Calendar/Indices File';
proc print data=caldata( obs=5 );
run;
title1 'Last 5 Observations in the Calendar/Indices File';
proc print data=caldata( firstobs=6659 ) noobs;
run;
title1 "Periodic Series for CUSIP='09523220'";
title2 "DATE >= '22dec88'd";
proc print data=periodic;
where cusip='09523220' and date >= '22dec88'd;
run;
title1 "Events for CUSIP='09523220'";
proc print data=events;
where cusip='09523220';
run;
Output 10.9.1: Partial Listing of the Output Data Sets
First 5 Observations in the Calendar/Indices File |
Obs |
date |
VWRETD |
VWRETX |
EWRETD |
EWRETX |
TOTVAL |
TOTCNT |
USDVAL |
USDCNT |
SPINDX |
SPRTRN |
1 |
02JUL1962 |
-99.0000 |
-99.0000 |
-99.0000 |
-99.0000 |
319043897 |
2036 |
0 |
0 |
55.86 |
-99.0000 |
2 |
03JUL1962 |
0.0113 |
0.0112 |
0.0131 |
0.0130 |
322929231 |
2040 |
319043897 |
2036 |
56.49 |
0.0113 |
3 |
05JUL1962 |
0.0060 |
0.0059 |
0.0069 |
0.0068 |
324750979 |
2031 |
322838977 |
2031 |
56.81 |
0.0057 |
4 |
06JUL1962 |
-0.0107 |
-0.0107 |
-0.0064 |
-0.0064 |
321302641 |
2031 |
324699079 |
2022 |
56.17 |
-0.0113 |
5 |
09JUL1962 |
0.0067 |
0.0067 |
0.0018 |
0.0018 |
323221296 |
2029 |
320935790 |
2019 |
56.55 |
0.0068 |
|
Last 5 Observations in the Calendar/Indices File |
date |
VWRETD |
VWRETX |
EWRETD |
EWRETX |
TOTVAL |
TOTCNT |
USDVAL |
USDCNT |
SPINDX |
SPRTRN |
23DEC1988 |
0.0042154 |
0.0028936 |
0.005104 |
0.003588 |
2367541510 |
2563 |
2360655540 |
2561 |
277.87 |
0.0036118 |
27DEC1988 |
-.0029128 |
-.0029624 |
-0.001453 |
-0.001585 |
2360680550 |
2565 |
2367496320 |
2562 |
276.83 |
-.0037429 |
28DEC1988 |
0.0015624 |
0.0015249 |
0.001575 |
0.001484 |
2364369540 |
2568 |
2360668370 |
2564 |
277.08 |
0.0009031 |
29DEC1988 |
0.0067816 |
0.0066433 |
0.005578 |
0.005469 |
2379932980 |
2565 |
2364169480 |
2563 |
279.40 |
0.0083724 |
30DEC1988 |
-.0027338 |
-.0029144 |
0.010736 |
0.010572 |
2362374030 |
2567 |
2379932980 |
2565 |
277.72 |
-.0060126 |
|
Periodic Series for CUSIP='09523220' |
DATE >= '22dec88'd |
Obs |
CUSIP |
PERMNO |
COMPNO |
ISSUNO |
HEXCD |
HSICCD |
DATE |
BIDLO |
ASKHI |
PRC |
VOL |
RET |
SXRET |
BXRET |
3 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
22DEC1988 |
15.00 |
15.375 |
15.375 |
54300 |
0.016529 |
. |
. |
4 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
23DEC1988 |
15.50 |
15.750 |
15.625 |
17700 |
0.016260 |
. |
. |
5 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
27DEC1988 |
15.50 |
15.750 |
15.625 |
10600 |
0.000000 |
. |
. |
6 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
28DEC1988 |
15.50 |
15.500 |
15.500 |
10600 |
-0.008000 |
. |
. |
7 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
29DEC1988 |
15.25 |
15.500 |
15.375 |
7000 |
-0.008065 |
. |
. |
8 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
30DEC1988 |
15.00 |
15.250 |
15.000 |
13700 |
-0.024390 |
. |
. |
|
Events for CUSIP='09523220' |
Obs |
CUSIP |
PERMNO |
COMPNO |
ISSUNO |
HEXCD |
HSICCD |
EVENT |
DATE |
NCUSIP |
TICKER |
COMNAM |
SHRCLS |
SHRCD |
EXCHCD |
SICCD |
DISTCD |
DIVAMT |
FACPR |
FACSHR |
DCLRDT |
RCRDDT |
PAYDT |
SHROUT |
SHRFLG |
DLSTCD |
NWPERM |
NEXTDT |
DLBID |
DLASK |
DLPRC |
DLVOL |
DLRET |
TRTSCD |
NMSIND |
MMCNT |
NSDINX |
1 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
NAMES |
03MAY1988 |
09523220 |
BAW |
BLUE ARROW PLC |
|
3 |
1 |
7361 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
2 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
DIST |
18JUL1988 |
|
|
|
|
. |
. |
. |
1212 |
0.13376 |
0 |
0 |
13JUL88 |
22JUL88 |
26AUG88 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
3 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
SHARES |
03MAY1988 |
|
|
|
|
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
72757 |
0 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
4 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
SHARES |
30SEP1988 |
|
|
|
|
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
706842 |
0 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
5 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
SHARES |
30DEC1988 |
|
|
|
|
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
706842 |
0 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
6 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
DELIST |
30DEC1988 |
|
|
|
|
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
100 |
0 |
. |
. |
. |
0 |
. |
A |
. |
. |
. |
. |
|
This example illustrates the following points:
- When data span more than one physical volume, the filerefs of the
security files residing on
each volume must be given following the fileref of the
calendar/indices file. The
DATASOURCE procedure reads each of these files in the order
they are specified. Therefore,
you can request that all three volumes be mounted to the same
tape drive, if you choose to do so.
- The INDEX option in the second PROC DATASOURCE run creates an index
file for the
OUT=PERIODIC data set. This index file provides random access to
the OUT= data set and
may increase the efficiency of the subsequent PROC and DATA
steps that use BY and WHERE statements.
The index variables are CUSIP, CRSP permanent number (PERMNO),
NASDAQ company number (COMPNO), NASDAQ
issue number (ISSUNO), header exchange code (HEXCD) and header
SIC code (HSICCD). Each one of these
variables forms a different key, that is, a single index. If you
want to form keys from a combination
of variables (composite indexes) or use some other variables as
indexes, you should use the INDEX=
data set option for the OUT= data set.
- The OUTEVENT=EVENTS data set is sparse. In fact, for each EVENT type,
a unique set of event variables are defined. For example, for
EVENT='SHARES', only the variables
SHROUT and SHRFLG are defined, and they have missing values for
all other EVENT types. Pictorially, this structure is similar
to the data set shown in Figure 10.8. Because of this sparse
representation, you should create the OUTEVENT= data set only when you need
a subset of securities and events.
By default, the OUT= data set contains only the periodic data.
However, you may
also want to include the event-oriented data in the OUT= data set. This is
accomplished by listing the event variables together with periodic variables
in a KEEP statement. For example, if you want to extract the historical
CUSIP (NCUSIP), number of shares outstanding (SHROUT), and
dividend cash amount (DIVAMT) together with all the periodic series, use the following
statements:
proc datasource filetype=crspdcs
infile=( calfile secfile1 secfile2 secfile3 )
out=both outevent=events;
where cusip='09523220';
keep bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
run;
proc datasource filetype=crspdcs
infile=( calfile secfile1 )
out=both outevent=events;
where cusip='09523220';
keep bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
run;
proc datasource filetype=crspdcs
infile=( calfile secfile1 )
out=both2 outevent=events2;
where cusip='09523220';
keep bidlo askhi prc vol ret sxret bxret ncusip shrout divamt;
keepevent ncusip shrflg;
run;
title1 "Printout of the First 4 Observations";
title2 "CUSIP = '09523220'";
proc print data=both noobs;
var cusip date vol ncusip divamt shrout;
where cusip='09523220' and date <= '08may88'd;
run;
title1 "Printout of the Observations centered Around 18jul88";
title2 "CUSIP = '09523220'";
proc print data=both noobs;
var cusip date vol ncusip divamt shrout;
where cusip='09523220' and
date between '14jul88'd and '20jul88'd;
run;
title1 "Printout of the Observations centered Around 30sep88";
title2 "CUSIP = '09523220'";
proc print data=both noobs;
var cusip date vol ncusip divamt shrout;
where cusip='09523220' and
date between '28sep88'd and '04oct88'd;
run;
Output 10.9.2: Including Event Variables in the OUT= Data Set
Printout of the First 4 Observations |
CUSIP = '09523220' |
CUSIP |
DATE |
VOL |
NCUSIP |
DIVAMT |
SHROUT |
09523220 |
03MAY1988 |
296100 |
09523220 |
. |
72757 |
09523220 |
04MAY1988 |
139200 |
09523220 |
. |
72757 |
09523220 |
05MAY1988 |
9000 |
09523220 |
. |
72757 |
09523220 |
06MAY1988 |
7900 |
09523220 |
. |
72757 |
|
Printout of the Observations centered Around 18jul88 |
CUSIP = '09523220' |
CUSIP |
DATE |
VOL |
NCUSIP |
DIVAMT |
SHROUT |
09523220 |
14JUL1988 |
62000 |
09523220 |
. |
72757 |
09523220 |
15JUL1988 |
106800 |
09523220 |
. |
72757 |
09523220 |
18JUL1988 |
32100 |
09523220 |
0.13376 |
72757 |
09523220 |
19JUL1988 |
8600 |
09523220 |
. |
72757 |
09523220 |
20JUL1988 |
10700 |
09523220 |
. |
72757 |
|
Printout of the Observations centered Around 30sep88 |
CUSIP = '09523220' |
CUSIP |
DATE |
VOL |
NCUSIP |
DIVAMT |
SHROUT |
09523220 |
28SEP1988 |
33000 |
09523220 |
. |
72757 |
09523220 |
29SEP1988 |
55200 |
09523220 |
. |
72757 |
09523220 |
30SEP1988 |
40700 |
09523220 |
. |
706842 |
09523220 |
03OCT1988 |
13400 |
09523220 |
. |
706842 |
09523220 |
04OCT1988 |
110600 |
09523220 |
. |
706842 |
|
Events referring to distributions and delistings have entries only in observations
whose dates match the event dates. For example, DIVAMT has a value
for only 18JUL88,
as shown in the second printout in Output 10.9.2. The NAME and SHARES events
refer to a date of change, therefore their values are expanded such that there is a
value for each observation. For example, the date of NAMES record is 03MAY88, therefore
NCUSIP has the same value from that date on. The SHROUT on the other hand changes
its value twice, once on 03MAY88, the other time on 30SEP88. The third listing shows how the value of SHROUT remains constant at 72757 from
03MAY88 to 30SEP88, at which date it changes to 706842.
The events occurring on days other than the trading dates are not output to the
OUT= data set.
The KEEP statement in the preceding example has no effect on the event
variables output to the OUTEVENT= data set. If you want to extract only
a subset of event variables, you need to use the KEEPEVENT statement.
For example, the following code outputs only NCUSIP and SHROUT to the OUTEVENT=
data set for CUSIP='09523220':
proc datasource filetype=crspdxc
infile=( calfile secfile1 secfile2 secfile3 )
outevent=subevts;
where cusip='09523220';
keepevent ncusip shrout;
run;
proc datasource filetype=crspdxc
infile=( calfile secfile1)
outevent=subevts;
where cusip='09523220';
keepevent ncusip shrout;
run;
title1 "NCUSIP and SHROUT for CUSIP='09523220'";
proc print data=subevts noobs;
run;
Output 10.9.3: Listing of the OUTEVENT= Data Set with a KEEPEVENT Statement
NCUSIP and SHROUT for CUSIP='09523220' |
CUSIP |
PERMNO |
COMPNO |
ISSUNO |
HEXCD |
HSICCD |
EVENT |
DATE |
NCUSIP |
SHROUT |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
NAMES |
03MAY1988 |
09523220 |
. |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
SHARES |
03MAY1988 |
|
72757 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
SHARES |
30SEP1988 |
|
706842 |
09523220 |
75285 |
0 |
0 |
1 |
7361 |
SHARES |
30DEC1988 |
|
706842 |
|
The OUTEVENT= data set in Output 10.9.3 is missing observations for which
the EVENT variable is DIST or DELIST, since these event groups do not contain any
selected events.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.