Chapter Contents

Previous

Next

Generation Data Sets


Definition of Generation Data Sets

Generation data sets
are historical copies of a SAS data set. Beginning with Version 7, you can keep multiple copies of a SAS data set by requesting the generations feature. The multiple copies represent versions of the same data set, which is archived each time it is replaced. The copies are referred to as a generation group and are a collection of data sets with the same root member name but with different version numbers. There is a base version, which is the most recent version, plus a set of historical versions.

You can request generations for both SAS data files and SAS data views; however, there are differences:

Note:   Generation data sets provide historical versions of a data set; they do not track observation updates for an individual data set.   [cautionend]


Terminology

The following terms are relevant to generation data sets:

base version
is the most recently created version of a data set. Its name does not have the four-character suffix for the generation number.

oldest version
is the oldest version in a generation group.

generation group
is a group of data sets that represent a series of replacements to the original data set. The generation group consists of the base version and a set of historical versions.

GENMAX=
is an output data set option that specifies how many versions (including the base version and all historical versions) to keep for a given data set.

GENNUM=
is an INPUT data set option that specifies which version of a data set to open. Positive numbers are absolute references to a historical version by its generation number. Negative numbers are a relative reference to historical versions. For example, GENNUM=-1 refers to the youngest version. GENNUM=0 refers to the current version.

generation number
is a monotonically increasing number that identifies one of the historical versions in a generation group. For example, the data set named AIR#272 has a generation number of 272.

historical versions
are the older copies of the base version of a data set. Names for historical versions have a four-character suffix for the generation number, such as #003.

rolling over
specifies the process of the version number moving from 999 to 000. When generation number reaches 999, its next value is 000.

shift down
specifies a demotion of the base version to be the youngest version and a deletion of the oldest version, if applicable. This typically happens when you create a new base version.

shift up
specifies a promotion of the youngest version to be the base version. This typically happens when you delete the base version.

youngest version
is the version that is chronologically closest to the base version.


Invoking Generation Data Sets

To invoke generation data sets and to specify the number of versions to maintain, include the output data set option GENMAX= when creating or replacing a data set. For example, the following DATA step creates a new data set and requests that up to four copies be kept (one base version and three historical versions):

 data a(genmax=4);
      x=1;
      output;
   run;

Once generations is in effect, the data set member name is limited to 28 characters (rather than 32), because the last four characters are reserved for a version number. When generations is not in effect (that is, GENMAX=0), the member name can be up to 32 characters. See the GENMAX= data set option in SAS Language Reference: Dictionary.

If a password is assigned, all files within a generation group must have the same password. SAS automatically applies any password that you assign to the base version to all of the versions in the group.


Maintaining a Generation Group

The first time a data set with generations in effect is replaced, SAS keeps the replaced data set and appends a four-character version number to its member name, which includes # and a three-digit number. That is, for a data set named A, the replaced data set becomes A#001. When the data set is replaced for the second time, the replaced data set becomes A#002; that is, A#002 is the version that is chronologically closest to the base version. After three replacements, the result is:
A base (current) version
A#003 most recent (youngest) historical version
A#002 second most recent historical version
A#001 oldest historical version.

With GENMAX=4, a fourth replacement deletes the oldest version, which is A#001. As replacements occur, SAS will always maintain four copies. For example, after ten replacements, the result is:
A base (current) version
A#010 most recent (youngest) historical version
A#009 2nd most recent historical version
A#008 oldest historical version

The limit for version numbers that SAS can append is #999. That is, after 999 replacements, the youngest version is #999. After 1,000 replacements, SAS rolls over the youngest version number to #000. After 1,001 replacements, the youngest version number is #001. For example, using data set A with GENNUM=4, the results would be:
999 replacements
  • A (current)

  • A#999 (most recent)

  • A#998 (2nd most recent)

  • A#997 (oldest)

1,000 replacements
  • A (current)

  • A#000 (most recent)

  • A#999 (2nd most recent)

  • A#998 (oldest)

1,001 replacements
  • A (current)

  • A#001 (most recent)

  • A#000 (2nd most recent)

  • A#999 (oldest)

The following figure shows how names are assigned to generation data sets:

Naming Generation Group Data Sets
Time SAS Code Data Set Name(s) GENNUM= Absolute Reference GENNUM= Relative Reference Explanation
1 data air (genmax=3); AIR 1 0 AIR data set created at time 1, and three generations requested
2 data air; AIR

AIR#001

2

1

0

-1

New AIR is created at time 2. AIR from time 1 is renamed AIR#001.
3 data air; AIR

AIR#002

AIR#001

3

2

1

0

-1

-2

New AIR is created at time 3. AIR from time 2 is renamed AIR#002.
4 data air; AIR

AIR#003

AIR#002

4

3

2

0

-1

-2

New AIR is created at time 4. AIR from time 3 is renamed AIR#003. AIR#001 from time 1, which is the oldest, is deleted.
5 data air (genmax=2); AIR

AIR#004

5

4

0

-1

New AIR is created at time 5, and the number of generations is changed to two. AIR from time 4 is renamed AIR#004. The two oldest versions are deleted.


Processing Specific Versions of a Generation Group

Once a generation group exists, SAS processes the base version by default. For example, the following PRINT procedure prints the base version:

proc print data=a;
run; 

To request a specific version from a generation group, use the GENNUM= input data set option. There are two methods that you can use:

Requesting Specific Generation Data Sets
This SAS statement ... produces this result ...
proc print data=air(gennum=0);

proc print data=air;

Prints the current (base) version of the AIR data set.
proc print data=air(gennum=-2);
Prints the version two generations back from the current version.
proc print data=air(gennum=3);
Prints the file AIR #003.
proc print data=air(gennum=1000);
After 1,000 replacements, prints the file AIR#000, which is the file that is created after AIR #999.


Managing Generation Data Sets

Displaying Data Set Information

A variety of statements in PROC DATASETS process a specific historical version. For example, you can display data set version numbers for historical copies using the

In addition, you can display the contents for an individual historical version.

Copying and Appending Generation Data Sets

You can use the COPY statement in PROC DATASETS or the COPY procedure to copy a generation group. For example, the following DATASETS procedure uses the COPY statement to copy a generation data group MYGEN1 from library MYLIB1 to library MYLIB2.

libname mylib1 'SAS-data-library1';
libname mylib2 'SAS-data-library2';

   proc datasets;
     copy in=mylib1 out=mylib2;
     select mygen1;
run;

You can use the GENNUM= data set option to append a specific historical version. For example, the following DATASETS procedure uses the APPEND statement to append a historical version of data set B to data set A. Note that by default, SAS uses the base version for the BASE= data set.

proc datasets; 
   append base=a data=b(gennum=2);
run;

Modifying the Number of Generations

When modifying the attributes of a data set, you can increase or decrease the number of copies for an existing generation group. If you decrease the number of versions, SAS deletes the oldest version(s) so as not to exceed the new maximum. For example, the following statement can be used in a data step to change the number of copies maintained for data set A to three:

modify a(genmax=3);

You can also use the MODIFY statement of the DATASETS procedure to modify the number of generations on an existing file:

libname mylib SAS-data-library;
proc datasets lib=mylib;
   modify air(genmax=4);
run;

The previous statements modify the number of generations for MYLIB.AIR to 4. If the modification reduces the number of generations, then SAS deletes the oldest versions above the new limit.

Deleting Versions of Generation Data Sets

When deleting data sets, you can delete a specific version as well as delete an entire generation group. The following table shows the types of delete operations and effects on generation data sets when you delete versions of a generation group. For this data set, assume that the base version of AIR and two historical versions (AIR#001 and AIR#002) exist already for each command.

Deleting Generation Data Sets
These SAS statements in PROC DATASETS ... produce this result ...
delete air;

delete air(gennum=0);

Deletes the base version and shifts up historical versions. AIR#002 is renamed to AIR and becomes the new base version.
delete air(gennum=2);
Deletes AIR#002.
delete air(gennum=-2);
Deletes the second youngest version (AIR#001). If the referenced file does not exist, this causes an error.
delete air(gennum=all); Deletes all data sets in the generation group, including the base file.
delete air(gennum=hist); Deletes all data sets in the generation group, except the base file.

A complete set of GENNUM= specifications is listed under the DATASETS procedure, DELETE statement, in the SAS Language Reference: Dictionary.

Renaming Versions of Generation Data Sets

When renaming a data set, you can rename an entire generation group:

 change a=newa;
Or you can rename a single copy using the CHANGE statement in PROC DATASETS. Note that if the single copy is the base (gennum=0), the youngest historical version automatically becomes the base.
change a(gennum=2)=newa;


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.