Chapter Contents

Previous

Next
SAS Companion for the OpenVMS Operating Environment

Using the CONCUR Engine

The concurrency (CONCUR) engine allows concurrent read and write access to data sets. Note that the concurrency engine supports only SAS data sets. It does not support SAS files of member types other than DATA, such as INDEX or CATALOG.

In contrast to the V8 engine, the CONCUR engine does not support indexing and compression of observations. The CONCUR engine can only access files within a single machine or OpenVMS cluster; access to SAS data sets on other operating environments and concurrent read/write access to SAS data sets across DECnet are features that are provided by SAS/SHARE software. For more information about using SAS/SHARE software, refer to SAS/SHARE User's Guide. The CONCUR engine is optimized for random concurrent access, while the V8 engine is better suited to sequential access. So, for example, if you intend to use the FSEDIT procedure or the POINT= option in the SET statement to access your data randomly, the CONCUR engine may be the best choice for you, even if you do not need any of the concurrent access capabilities.

Version 8 of the SAS System introduces support for several new features related to data sets. The CONCUR engine supports many of these features: member names with lengths up to 32 characters; variable names with lengths up to 32 characters; and member or variable labels with lengths up to 256 characters. Note that while the CONCUR engine supports the creation and access of Version 6 format files, the long character strings are not allowed when accessing or creating a Version 6 concurrency engine file. For more information about support for these longer character strings, see SAS Language Reference: Concepts.


How to Select the CONCUR Engine

There are three ways to select the CONCUR engine:

The CONCUR engine creates and accesses SAS data sets in an acceptable format to allow record-locking and file-sharing.

CAUTION:
SAS data sets that are created with the CONCUR engine are not interchangeable with SAS data sets that are accessed and created with any other engine. If you plan to share a particular SAS data set, create it using the CONCUR engine.  [cautionend]

If you have a SAS data set that you want to share after it is created, you can copy it, using the CONCUR engine as the output engine. Then it will be in the correct format for sharing. For example, if you want shared update access to a data set that was created using the V8 engine, you can use the following statements to convert it:

libname inlib v8 '[mydir.base]';
libname outlib concur '[mydir.share]';
proc copy in=inlib out=outlib;
run;

After you run this SAS program, all SAS data sets that are created with the V8 engine in the data library that is referenced by INLIB are copied to the data library referenced by OUTLIB using the CONCUR engine. To create data sets using the CONCUR engine, your directory must have a version limit greater than 1.


Member Types Supported

The CONCUR engine supports the Version 8 member type DATA.


Engine/Host Options for the CONCUR Engine

Several concurrency engine options control the creation and access of SAS data sets. Most of these options have direct correlation to options available through OpenVMS Record Management Services (RMS). The CONCUR engine creates relative organization files with record-locking enabled.

Note:   Data sets created with the CONCUR engine have a maximum observation length of 32K.  [cautionend]

You can use the following engine/host options with the CONCUR engine:

ALQ=
specifies the number of OpenVMS disk blocks to allocate initially to a data set when it is created. The value can range from 0 to 2,147,483,647. If the value is 0, the minimum number of blocks required for a sequential file is used. The ALQ= option defaults to the bucket size. OpenVMS RMS always rounds the value up to the next disk cluster boundary.

The ALQ= option (allocation quantity) corresponds to the FAB$L_ALQ field in OpenVMS RMS. For additional details, see the data set option ALQ= and Guide to OpenVMS File Applications.

BKS=
specifies the number of OpenVMS disk blocks in each bucket of the file. The value can range from 0 to 63. If the value is 0, the bucket size used is the minimum number of blocks needed to contain a single observation. The default value is 32.

When deciding on the bucket size to use, consider whether the file is usually accessed randomly (small bucket size), sequentially (large bucket size), or both (medium bucket size). The bucket size is a permanent attribute of the file, so this option applies to output files only.

The BKS= option (bucket size) corresponds to the FAB$B_BKS field in OpenVMS RMS or the FILE BUCKET_SIZE attribute when using File Definition Language (FDL). For additional details, see the data set option BKS= and Guide to OpenVMS File Applications.

DEQ=
specifies the number of OpenVMS disk blocks to add each time OpenVMS RMS automatically extends a data set during a write operation. The value can range from 0 to 65,535. OpenVMS RMS always rounds the value up to the next disk cluster boundary. A large value can result in fewer file extensions over the life of the file; a small value results in numerous file extensions over the life of the file. A file with numerous file extensions that may be noncontiguous slows record access.

If the value specified is 0, OpenVMS RMS uses the default value for the process. The DEQ= option defaults to the bucket size.

The DEQ= option (default file extension quantity) corresponds to the FAB$W_DEQ field in OpenVMS RMS. For additional details, see the data set option DEQ= and Guide to OpenVMS File Applications.

FILEFMT=
specifies the file format, or version of the engine, to use. Allowed values are 606, 607, 701, and 801. The default value is 801. There was an internal file format change between Release 6.06 and Release 6.07, and again between Version 6 and Version 7. The concurrency (CONCUR) engine can create and access all versions of the file format. When you access a file for input or update, the CONCUR engine detects the correct version of the existing file. When you create a new file, the CONCUR engine defaults to creating a Version 8 format file unless overridden by the FILEFMT= option.

The following example shows how to create a file in Release 6.07 format:

libname clib concur '[]';
data clib.v607 (filefmt=607);
. . . more SAS statements . . .
run;

HOSTFMT=
specifies the host platform format for a data set. The concurrency (CONCUR) engine can create and access files for both OpenVMS Alpha and OpenVMS VAX. Valid values are ALPHA or VAX, respectively. By default the data set is created in the native format of the platform on which SAS is running. You may use the HOSTFMT= option to specify that the data set should be created in a different representation. This is similar to using the Version 8 data set option OUTREP= to specify a data representation in a non-native format. The use of HOSTFMT= and OUTREP= options is equivalent. HOSTFMT= is supported for compatibility with Version 6.

In the following example, the two data steps produce the same results:

data clib.vaxfile (hostfmt=vax);
. . . more SAS statements . . .
run;
data clib.vaxfile (outrep=vax_vms);
. . . more SAS statements . . .
run;

For more information about the OUTREP= data set option, see SAS Language Reference: Dictionary.

MBF=
specifies the number of I/O buffers you want OpenVMS RMS to allocate for a particular file. The value can range from 0 to 127, and it represents the number of buffers to use. By default, this option is set to 2 for files opened for update and 1 for files opened for input or output. If the value 0 is specified, the process' default value is used.

The MBF= option (multibuffer count) corresponds to the RAB$B_MBF field in OpenVMS RMS or the CONNECT MULTIBUFFER_COUNT attribute when using FDL. For additional details, see the data set option MBF= and Guide to OpenVMS File Applications.


Data Set Options Supported by the CONCUR Engine

The CONCUR engine recognizes all data set options that are documented in SAS Language Reference: Dictionary except the FILECLOSE=, COMPRESS=, and REUSE options. Of special importance to the CONCUR engine is the portable data set option CNTLLEV=. (For details, see CNTLLEV=.) Other data set options that are likely to be useful include LOCKREAD= and LOCKWAIT=. (For details, see LOCKREAD= and LOCKWAIT=.) For more information, refer to SAS Language Reference: Dictionary.

The engine/host options that are discussed in Engine/Host Options for the CONCUR Engine can also be used as data set options when you use the CONCUR engine. For details, see Specifying Data Set Options.


System Option Values Used by the CONCUR Engine

The CONCUR engine does not use the values of any SAS system options.


DECnet Access

The CONCUR engine supports both creation and reading of files across DECnet, but not the updating of files across DECnet. You are allowed to create and read files because the engine uses multistreaming only when the file is opened for update. Support of DECnet access means you can now specify a node name in the physical pathname of your SAS data library, as long as you do not plan to update the data sets stored in the data library. The following is an example:

libname mylib concur 'mynode::bldgc:[testdata]';


Passwords

The CONCUR engine supports SAS System passwords. The syntax and behavior is the same as passwords used with the V8 (BASE) engine.


Internals of a Concurrency Engine Data Set

This section describes the internal structure of a concurrency engine data set. If you are familiar with OpenVMS RMS, it may be helpful to know the internal file format of a concurrency engine data set.

A concurrency engine data set is a relative format file. The record length is determined by the length of one observation, with a minimum length of 8 bytes. Because the data set is a relative format file, the maximum observation length of a concurrency engine data set is 32,767 bytes. The first portion of the file contains header records that provide information to the engine concerning the number of observations in the file, the number of variables, some positioning information to optimize access, the date and time, SAS System release, operating environment the data set was created on, and so forth.

Following the header information is information pertaining to each individual variable in the file. A NAMESTR is stored for each variable on the data set. The NAMESTR includes the variable name, type, label, and size. Multiple NAMESTRs are stored in a single record, up to the maximum number of NAMESTRs that the record length accommodates.

After the NAMESTRs, the observations begin. There is always one observation per record. With one exception, the record length is the observation length. If the observation length is less than 8 bytes, the record length defaults to 8. If you delete a record in a relative format file, the record still exists in the file, but it is marked as deleted.

Note:   In a concurrency engine data set, a data set of deleted observations takes the same amount of disk space as a data set of valid observations. To remove the deleted observations, you must use the COPY procedure and copy the data set to a new data set type, such as a data set created with the V8 or V8TAPE engine.  [cautionend]

Although all record-locking capabilities are provided through the use of OpenVMS RMS features, some file-sharing capabilities are provided by OpenVMS RMS and some are provided by the engine itself. The engine can correctly set the share options of a file when the file is opened for input or update, because the SAS System uses the name of the existing data set directly. However, output data sets are created with a temporary name and then renamed to the actual data set name after the data set is closed. This ensures the integrity of existing data sets of the same name in case an error occurs during creation of the new data set. Therefore, the engine must handle all file-sharing issues that disallow sharing of output files. This is done through the locking of specific filenames, which is why your directory must have a version limit of at least 2 to create concurrency engine data sets.


Optimizing the Performance of the CONCUR Engine

Engine performance is often a trade-off between various factors. This section provides you with the necessary information so that you can optimize the performance of the CONCUR engine in your operating environment. By controlling the size and number of buffers, you can specify how the SAS System accesses your data. By specifying the data set options, you can control the level and amount of data that are accessed. The amount of disk space available for these operations also effects engine performance.

Controlling the Size and Number of Buffers

Depending on the type of record access your SAS application performs, you need to consider both the size of buffers (bucket size) and the number of buffers (multibuffer count). For complete details about specifying the size and number of buffers, see the BKS= and MBF= data set options in BKS= and MBF=.

The two extremes of record access are records that are accessed completely sequentially or completely randomly. For example, many SAS procedures typically access data sets sequentially, processing the records from first to last. On the other hand, you may access observations in a completely random order when using the FSEDIT procedure to edit or browse observations in a data set.

There are also cases in which records are accessed randomly but may be reaccessed frequently. One example is an application that uses a data set in which particular observations contain information that is referred to frequently. Again, using the FSEDIT procedure as an example, the data set can be designed in such a way that you must access the first observation followed by observation 200, then the first observation again followed by observation 300, and so on.

Finally, there are cases in which records are accessed randomly, but then adjacent records are likely to be accessed. An application can use the POINT= option in a SET statement to selectively input the first 10 observations out of every 100 observations.

Most often, an application accesses a data set by a combination of several of these methods. The following list gives suggestions for the number of buffers and bucket size you should use for each method:

completely sequential or random access
is most efficient with a single buffer. However, the bucket size differs:

random access
is more efficient with a smaller bucket size.

sequential access
is more efficient with a larger bucket size.

random access with reaccessed records
is most efficient with multiple buffers to keep the reaccessed records in the buffer cache. You should use a small bucket size in this instance.

random access with subsequent adjacent access
is most efficient with a single buffer. However, use a larger bucket size so that more records are stored in the buffer cache. This increases the probability that the required records have been read into memory with a single I/O.

If your program accesses the data set by several methods, you must find a compromise between the number of buffers and bucket sizes. This is what the SAS System attempts to do with the defaults, because the intended use of the file is unknown. Because you know the intended use of your CONCUR engine data sets, you can improve the CONCUR engine's performance by optimizing the buffer settings.

Using Portable Data Set Options

Several data set options are portable options that are available for all engines, but they are particularly useful in conjunction with the concurrency engine.

CNTLLEV=
specifies the level of access (control level) to the data set, whether concurrent or exclusive. If you decide to create a concurrency engine data set to take advantage of its random access optimizations, but you do not need to provide for concurrent access at this time, you can use the CNTLLEV= data set option to further improve performance. By default, when using the concurrency engine, data sets that are opened for input allow shared read access, data sets that are opened for output allow no sharing, and data sets that are opened for update allow shared update access. When sharing is allowed, record-level locking is enabled. When you do not need this feature, you can reduce the overhead of record locking by using CNTLLEV=MEM to disable the sharing.

The CNTLLEV= data set option takes one of two values:
MEM specifies that the application requires exclusive access to the data set. Member-level control restricts any other application from accessing the data set until the step has completed.
REC specifies that concurrent access is allowed and OpenVMS RMS record-level locking is enabled. This option entails more processing overhead and should be used only when necessary.

Each SAS procedure specifies a required control level to the engine, depending on the intended access of the observations. If you use CNTLLEV=REC and the SAS procedure requires member-level control to ensure the integrity of the data during processing, a warning is written to the SAS log indicating that inaccurate or unpredictable results can occur if the data set is being updated by another process during the analysis.

A common example of improving performance by overruling the CNTLLEV default of the procedure is with the FSEDIT procedure, which uses a default of CNTLLEV=REC. A session using the FSEDIT procedure with a concurrency engine data set does not need to incur the overhead of record-level locking if concurrent access is not required. By using the data set option CNTLLEV=MEM, the application tells the engine to override the control level specification of the procedure because exclusive access at the member level is desired. This disables record-level locking, decreases the overhead for processing the data set, and improves performance. In tests using the SET statement to input a concurrency engine data set, using the CNTLLEV=MEM option caused the step to run in one-third the CPU time as the same step using the CNTLLEV=REC option.

For syntax and usage examples for the CNTLLEV= data set option, see CNTLLEV= and SAS Language Reference: Dictionary.

FIRSTOBS= and OBS=
specify a beginning and ending observation to subset your data set.

The value of the FIRSTOBS= data set option specifies the first observation that should be included for processing in the SAS DATA step. Some engines have to read the records sequentially, discarding them until the requested observation is reached. Because a concurrency engine data set is a relative format file, the engine can directly access the beginning observation without having to first read any other observations in the file.

Using the OBS= data set option to specify the last observation that you want to process can improve performance by terminating the input of observations without having to read records until the end-of-file character is reached.

For more information about the FIRSTOBS= and OBS= data set options, see SAS Language Reference: Dictionary.


Using the POINT= Option

You can use the POINT= option in a SET statement to access contiguous ranges of observation. For example, with the POINT= option, the SAS program can read observations 10 through 50, then observations 90 through 150, and so on. Obviously, only reading the records that you actually need improves performance by decreasing the number of records you must access. Due to the physical format of a concurrency engine data set, the engine can access the required records directly.

Disk Space Usage

For most data sets, the disk space that is required for a CONCUR engine data set and a V8 engine data set are comparable. However, for data sets in which the number of observations is greater than the number of variables, concurrency engine data sets are usually smaller. An exception to this is a concurrency engine data set that has many variables and only a few observations; in this case, space may be wasted.

However, there is a file format for both uncompressed and compressed data sets that makes the V8 engine disk space usage more efficient.

Performance Comparisons

Performance is a main concern for many applications, so it is useful to know how the CONCUR engine compares to the V8 engine when various features of the SAS System are used:

Creating data sets
When you compare the creation and sequential input of data sets using each engine, the V8 engine tends to be faster when the data sets are small. However, as the size of the data set increases, the V8 and CONCUR engines are comparable in CPU time used. In all cases, the page faults that are incurred for the CONCUR engine are substantially less than for the V8 engine.

Accessing existing data sets
When you compare random access of an existing file using both engines, the concurrency engine is much faster. When you use a large bucket size in the concurrency engine, with a comparable page size in the V8 engine, the concurrency engine takes approximately one-half as much CPU time. When the bucket size and page size are small, the concurrency engine takes about one-third as much CPU time. Again, page faults for the concurrency engine are substantially less.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.