Chapter Contents |
Previous |
Next |
The Default Sequence of Execution in the DATA Step |
Structure of a DATA Step | Action Taken | |
---|---|---|
DATA statement | begins the step counts iterations |
|
Data-reading statements: * | ||
INPUT |
describes the arrangement of values in the input data record from a raw data source | |
SET |
reads an observation from one or more SAS data sets | |
MERGE | joins observations from two or more SAS data sets into a single observation | |
MODIFY | replaces, deletes, or appends observations in an existing SAS data set in place | |
UPDATE | updates a master file by applying transactions | |
Optional SAS programming statements, for example: | further processes the data for the current observation. | |
FirstQuarter=Jan+Feb+Mar; if RetailPrice < 500; |
computes the value for FirstQuarter for the current
observation. subsets by value of variable RetailPrice for the current observation |
|
Default actions at the end of processing an observation | ||
At end of DATA step: Automatic write, automatic return At top of DATA step: Automatic reset |
writes an observation to a SAS data set returns to the DATA statement resets values to missing in program data vector |
|
*The table shows the default processing of the DATA step. You can alter the sequence of statements in the DATA step. You can code optional programming statements, such as creating or reinitializing a constant, before you code a data-reading statement. |
Note: You can also use functions to read and process
data. For information about how statements and functions process data differently,
see Using Functions to Manipulate Files.
For specific information about SAS functions, see the SAS Functions listed
under the "SAS I/O Files" and "External Files" categories in the SAS Functions
section of SAS Language Reference: Dictionary.
Changing the Default Sequence of Execution |
When ... | you can ... | |
---|---|---|
Reading a record | merge, modify, join data sets read multiple records to create a single observation randomly select records for processing read from multiple external files read selected fields from a record by using statement or data set options |
|
Processing data | use conditional logic retain variable values |
|
Writing an observation | write to a SAS data set or to an external file control when output is written to a data set write to multiple files |
You can also use
functions to read and process data. For information about how statements and
functions process data differently, see Using Functions to Manipulate Files.
SAS Language Reference: Concepts. For specific information about SAS functions, see the SAS Functions
listed under the "SAS I/O Files" and "External Files" categories in the SAS
Functions section of SAS Language Reference: Dictionary.
SAS Language Element | Function | |
---|---|---|
subsetting IF statement | stops the current iteration when a condition is false, does not write the current observation to the data set, and returns control to the top of the DATA step. | |
IF-THEN/ELSE statement | stops the current iteration when a conditon is true, writes the current observation to the data set, and returns control to the top of the DATA step. | |
DO loops | cause parts of the DATA step to be executed multiple times. | |
LINK and RETURN statements | alter the flow of control, execute statements following the label specified, and return control of the program to the next statement following the LINK statement. | |
HEADER= option in the FILE statement | alters the flow of control whenever a PUT statement causes a new page of output to begin; statements following the label specified in the HEADER= option are executed until a RETURN statement is encountered, at which time control returns to the point from which the HEADER= option was activated. | |
GO TO statement | alters the flow of execution by branching to the label that is specified in the GO TO statement. SAS executes subsequent statements then returns control to the beginning of the DATA step. | |
EOF= option in an INFILE statement | alters the flow of execution when the end of the input file is reached; statements following the label that is specified in the EOF= option are executed at that time. | |
_N_ automatic variable in an IF-THEN construct | causes parts of the DATA step to execute only for particular iterations. | |
SELECT statement | conditionally executes one of a group of SAS statements. | |
OUTPUT statement in an IF-THEN construct | outputs an observation before the end of the DATA step, based on a condition; prevents automatic output at the bottom of the DATA step. | |
DELETE statement in an IF-THEN construct | deletes an observation based on a condition and causes a return to the top of the DATA step. | |
ABORT statement in an IF-THEN construct | stops execution of the DATA step and instruct SAS to resume execution with the next DATA or PROC step. It can also stop executing a SAS program altogether, depending on the options specified in the ABORT statement and on the method of operation. | |
WHERE statement or WHERE= data set option | causes SAS to read certain observations based on one or more specified criteria. |
Step Boundary -- How To Know When Statements Take Effect |
data _null_; [1] set allscores(drop=score5-score7); title 'Student Test Scores'; [2] data employees; [3] set employee_list; run;
The DATA statement begins a DATA step and is a step boundary. | |
The TITLE statement is in effect for both DATA steps because it appears before the boundary of the first DATA step. (Because the TITLE statement is a global statement, | |
The DATA statement is the default boundary for the first DATA step. |
The TITLE statement in this example is in effect for
the first DATA step as well as for the second because the TITLE statement
appears before the boundary of the first DATA step. This example uses the
default step boundary
data employees;
.
The following example shows an OPTIONS statement inserted after a RUN statement.
data scores; [1] set allscores(drop=score5-score7); run; [2] options firstobs=5 obs=55; [3] data test; set alltests; run;
The OPTIONS statement specifies that the first observation
that is read from the input data set should be the 5th, and the last observation
that is read should be the 55th. Inserting a RUN statement immediately before
the OPTIONS statement causes the first DATA step to reach its boundary (
run;)
before SAS encounters the
OPTIONS statement. In this case, the step boundary is explicit. The OPTIONS
statement settings, therefore, are put into effect for the second DATA step
only.
The DATA statement is a step boundary. | |
The RUN statement is the explicit boundary for the first DATA step. | |
The OPTIONS statement affects the second DATA step only. |
Following the statements in a DATA step with a RUN statement is the simplest way to make the step begin to execute, but a RUN statement is not always necessary. SAS recognizes several step boundaries for a SAS step:
Note: For
SAS programs executed in interactive mode, a RUN statement is required to
signal the step boundary for the last step you submit.
When you submit a DATA step during interactive processing, it does not begin running until SAS encounters a step boundary. This fact enables you to submit statements as you write them while preventing a step from executing until you have entered all the statements.
What Causes a DATA Step to Stop Executing |
A DATA step that reads ... | from ... | with these statements | stops ... | |
---|---|---|---|---|
no data | after only one iteration | |||
any data | when it executes STOP or ABORT when the data is exhausted |
|||
raw data | instream data lines | INPUT statement | after the last data line is read | |
one external file | INPUT and INFILE statements | when end-of-file is reached | ||
multiple external files | INPUT and INFILE statements | when end-of-file is first reached on any of the files | ||
observations sequentially | one SAS data set | SET and MODIFY statements | after the last observation is read | |
multiple SAS data sets | one SET, MERGE, MODIFY, or UPDATE statement | when all input data sets are exhausted | ||
multiple SAS data sets | multiple SET, MERGE, MODIFY, or UPDATE statements | when end-of-file is reached by any of the data-reading statements |
A DATA step that reads observations from a SAS data set with a SET statement that uses the POINT= option has no way to detect the end of the input SAS data set. (This method is called direct or random access.) Such a DATA step usually requires a STOP statement.
A DATA step also stops when it executes a STOP or an ABORT statement. Some system options and data set options, such as OBS=, can cause a DATA step to stop earlier than it would otherwise.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.