Chapter Contents

Previous

Next

Processing BY-Groups in the DATA Step


Overview

The most common use of BY-group processing in the DATA step is to combine two or more SAS data sets using a BY statement with a SET, MERGE, MODIFY, or UPDATE statement. When processing these statements, SAS reads one observation at a time into the program data vector. With BY-group processing, SAS selects the observations from the data sets according to the values of the BY variable or variables. After processing all the observations from one BY group, SAS expects the next observation to be from the next BY group.

The BY statement modifies the action of the SET, MERGE, MODIFY, or UPDATE statement by controlling when the values in the program data vector are set to missing. During BY-group processing, SAS retains the values of variables until it has copied the last observation it finds for that BY group in any of the data sets. Without the BY statement, the SET statement sets variables to missing when it reads the last observation from any data set, and the MERGE statement does not set variables to missing after the DATA step starts reading observations into the program data vector.

You can process observations conditionally by using the subsetting IF or IF-THEN statements, or the SELECT statement, with the temporary variables FIRST.variable and LAST.variable (set up during BY-group processing). For example, you can use them to perform calculations for each BY group and to write an observation when the first or the last observation of a BY group has been read into the program data vector.

The following example computes annual payroll by department. It uses IF-THEN statements and the values of FIRST.variable and LAST.variable automatic variables to reset the value of PAYROLL to 0 at the beginning of each BY group and to write an observation after the last observation in a BY group is processed.

options pageno=1 nodate linesize=80 pagesize=60;  

proc sort data=salaries out=temp;    
   by Department;
data budget (keep=Department Payroll);   
   set temp;       
      by Department; 
   if WageCategory='Salaried' then YearlyWage=WageRate*12;    
   else if WageCategory='Hourly' then YearlyWage=WageRate*2000;  
      /* Set FIRST.variable to 1 if this is a new department */
      /* in the BY group.                                    */
   if first.Department then Payroll=0;    
   Payroll+YearlyWage;
      /* Set LAST.variable to 1 if this is the last department */
      /* in the current BY group.                              */
   if last.Department; 
run;

proc print data=budget;    
   format Payroll dollar10.;
   title 'Annual Payroll by Department'; 
run;

Output from Conditional BY-Group Processing
                          Annual Payroll by Department                         1

                        Obs    Department       Payroll

                         1        BAD          $952,000
                         2        DDG          $448,000
                         3        PPD          $522,000
                         4        STD          $496,000


Data Grouped by Ascending Order

This example reads data that is in ASCENDING order and adds 13 to the inventory number. The input data must be sorted in ascending order by the values of the BY variable.

options pageno=1 nodate linesize=80 pagesize=60;  

data current_inventory;
   set inventory;    
   by Name; 
   Number=Number+13;    
run; 

proc print data=current_inventory;    
   title 'Current Inventory'; 
run;    

Output in Ascending Order of BY Variable
                               Current Inventory                               1

                             Obs    Name     Number

                              1     Fern       63  
                              2     Hosta      31  
                              3     Ivy        24  
                              4     Moss       21  
                              5     Rose       19  
                              6     Vinca      16  


Data Grouped by Descending Order

options pageno=1 nodate linesize=80 pagesize=60;  

data current_inventory;
   set inventory;    
   by descending Name; 
   Number=Number+33;    
run; 

proc print data=current_inventory;    
   title 'Number of Plants on Order'; 
run;    

Output in Decreasing Order of BY Variable
                           Number of Plants on Order                           1

                             Obs    Name     Number

                              1     Vinca      36  
                              2     Rose       39  
                              3     Moss       41  
                              4     Ivy        44  
                              5     Hosta      51  
                              6     Fern       83  


Data Not in Alphabetic or Numeric Order

In BY-group processing, you can use data arranged in an order other than alphabetic or numeric, such as by calendar month or by category. To do this, use the NOTSORTED option in a BY statement when you use a SET statement. The NOTSORTED option in the BY statement tells SAS that the data is not in alphabetic or numeric order, but that it is arranged in groups by the values of the BY variable. You cannot use the NOTSORTED option with the MERGE statement, the UPDATE statement, or when the SET statement lists more than one data set.

This example assumes that the data is grouped by MONTH. The subsetting IF statement conditionally writes an observation, based on the value of LAST.month. This DATA step writes an observation only after processing the last observation in each BY group.

data total_sale(drop=sales);    
   set region.sales
   by month notsorted;    
   total+sales;    
   if last.month; 
run;


Data Grouped by Formatted Values

Use the GROUPFORMAT option in the BY statement to ensure that

The GROUPFORMAT option is valid only in the DATA step that creates the SAS data set. It is particularly useful with user-defined formats.

This example uses the FORMAT procedure, the GROUPFORMAT option, and the FORMAT statement to create and print a simple data set. The input TEST data set is sorted by ascending values. The NEWTEST data set is arranged by the formatted values of the variable Score.

options pageno=1 nodate linesize=80 pagesize=60;     

   /* Create a user-defined format */ 
proc format;    
   value Range 1-2='Low'                 
               3-4='Medium' 
               5-6='High';
run;     

   /* Create the SAS data set */ 
data newtest;    
   set test;   
   by groupformat Score; 
   format Score Range.;    
run;  

   /* Print using formatted values */ 
proc print data=newtest;    
   title 'Score Categories';    
   var Name Score;    
   by Score;    
run;

Grouping Observations By Using Formatted Values
                                Score Categories                               1

---------------------------------- Score=Low -----------------------------------

                              Obs    Name    Score

                               1     Jon      Low 


--------------------------------- Score=Medium ---------------------------------

                            Obs     Name      Score

                             2     Anthony    Medium
                             3     Miguel     Medium
                             4     Joseph     Medium


---------------------------------- Score=High ----------------------------------

                              Obs    Name    Score

                               5     Ian     High 
                               6     Jan     High 


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.