Chapter Contents

Previous

Next
SAS/ACCESS Software for PC File Formats: Reference

ACCESS Procedure: XLS Specifics

ACCESS Procedure Reference describes the generic options and procedure statements that enable you to create access descriptors, view descriptors, and SAS data files from PC file format data. The following section describes the PC file-specific statements you use in the SAS/ACCESS interface to XLS data.


ACCESS Procedure Statements for XLS

To create an access descriptor, you use the DBMS=XLS option and six database-description statements: PATH=, GETNAMES, RANGE, SCANTYPE, SKIPROWS, and WORKSHEET. These database-description statements supply XLS-specific information to the SAS System, and must immediately follow the CREATE statement that specifies the access descriptor to be created. In addition to the database-description statements, you can use editing statements when you create an access descriptor. These editing statements must follow the database-description statements.

Database-description statements are only required when you create access descriptors. Because XLS information is stored in an access descriptor, you do not need to repeat this information when you create view descriptors.

The SAS/ACCESS interface to XLS uses the following procedure statements in batch mode:

PROC ACCESS DBMS=XLS | EXCEL;
CREATE libref.member-name.ACCESS | VIEW;
UPDATE libref.member-name.ACCESS | VIEW;
GETNAMES <=> YES | NO | Y | N;
PATH= 'path-and-filename'<.XLS> '| <'>filename<'> | fileref;
RANGE <=> <'>range-name<'> | 'range-address';
SCANTYPE <=> YES | NO | Y | N | <number-of-rows>;
SKIPROWS <=> number-of-rows-to-skip;
WORKSHEET <=> worksheet-name;
ASSIGN <=> YES | NO | Y | N ;
DROP <'>column-identifier-1<'> <...<'>column-identifier-n<'>>;
FORMAT <'>column-identifier-1<'> <=> SAS-format-name-1
<...<'>column-identifier-n<'> <=> SAS-format-name-n> ;
LIST <ALL | VIEW | <'>column-identifier<'>> ;
MIXED <=> YES | NO | Y | N;
RENAME <'>column-identifier-1<'> <=> SAS-variable-name-1
<...<'>column-identifier-n<'> <=> SAS-variable-name-n> ;
RESET ALL | <'>column-identifier-1<'> <...<'>column-identifier-n<'>> ;
SELECT ALL | <'>column-identifier-1<'> <...<'>column-identifier-n<'>> ;
SUBSET selection-criteria ;
TYPE column-identifier-1 <=> C | N <... column-identifier-n <=> C | N>;
UNIQUE <=> YES | NO | Y | N ;
RUN;

Note:   By default, PROC ACCESS uses Excel 5.0. Excel 5.0 files have the identical format to Excel 95 (Version 7) files.   [cautionend]

The QUIT statement is also available in PROC ACCESS. However, its use causes the procedure to terminate. QUIT is used most often in the interactive line and noninteractive modes to exit the procedure without exiting SAS.

GETNAMES <=> YES | NO | Y | N;
determines whether SAS variable names are generated from column names in the first row of the range when an access descriptor is created. When you update a descriptor, you are not allowed to specify the GETNAMES statement.

The GETNAMES statement is optional. If you omit it, the default value GETNAMES=NO is used, and the XLS interface generates the SAS variable names VAR0, VAR1, VAR2, and so on. If you specify GETNAMES=YES, SAS variable names are generated from column names in the first row of the range. GETNAMES=YES also sets the SKIPROWS value to 1.

You can change the default value from NO to YES by setting the SS_NAMES environment variable. See Setting Environment Variables for more information on setting and changing environment variables.

The GETNAMES statement is a database-description statement. It must follow the CREATE statement and precede any editing statements when you create a descriptor.

MIXED <=> YES | NO | Y | N;
determines whether to convert Microsoft Excel numeric data values in a column to their character representation when the corresponding SAS variable is expecting a character value.

The MIXED statement is optional. Use the MIXED statement if you have both Microsoft Excel numeric and character data in a column. Specifying YES allows both numeric and character data to be displayed as SAS character data. NO, the default, treats any data in a column that does not match the specified data type as missing values.

You can change the default value to YES by setting the SS_MIXED environment variable. See Setting Environment Variables for more information on setting and changing environment variables.

The MIXED statement is an editing statement, and it must follow any database descriptions when you create an access descriptor.

RANGE <=> <'>range-name<'> | 'range-address';
subsets a specified section of an XLS file worksheet. The range-name is the name that is assigned to a range address within the worksheet. Range names can be up to 15 characters long and are not case-sensitive.

The range-address is identified by the top left cell that begins the range and the bottom right cell that ends the range within the XLS worksheet file. The beginning and ending cells are separated by two periods; for example, the range address C9..F12 indicates a cell range that begins at cell C9, ends at cell F12 and includes all cells in between.

The RANGE statement is optional. If you omit RANGE, the entire worksheet is accessed as the default range.

The RANGE is a database-description statement. It must follow the CREATE statement and precede any editing statements when you create a descriptor.

SCANTYPE <=> YES | NO | Y | N | <number-of-rows>;
finds the most common Excel data type and format for each column in a specified number of rows in an XLS worksheet to generate the SAS format. By default, SAS variable formats are generated from the Excel formats found in the first row of the entire worksheet or in the first row of a range (if specified) in the worksheet.

The SCANTYPE statement is optional, and its default value is NO. If you specify YES, the ACCESS procedure scans the data types and formats of all rows in each column of the worksheet or range and uses the most common one to generate the default SAS format for each column. If you specify a number of rows, PROC ACCESS scans the specified number of rows only and returns the most common format.

If you specify the SKIPROWS statement, the ACCESS procedure skips the specified rows and starts scanning from the next row. For example, if you specify SKIPROWS=3, PROC ACCESS skips the first three rows and begins scanning the data type and format on the fourth row.

You can change the default value to YES by setting the SS_SCAN environment variable. See Setting Environment Variables for more information on setting and changing environment variables.

Specifying SCANTYPE=0 is equivalent to specifying SCANTYPE=NO.

The SCANTYPE statement is a database-description statement. It must follow the CREATE statement and precede any editing statements when you create a descriptor.

SKIPROWS <=> number-of-rows-to-skip;
specifies the number rows, beginning at the top of the range in the XLS file, to ignore when reading data from the XLS file. The default value for SKIPROWS is 0. The skipped (or ignored) rows often contain information such as column labels or names or underscores rather than input data.

If GETNAMES=YES, the default value of SKIPROWS automatically changes to 1. The first row of data and formats after SKIPROWS in a range is used to generate the SAS variable types and formats. However, you can use the SCANTYPE statement to scan the formats of a specified number of rows and to use the most common data type and format to generate the default SAS variable types and formats. See Setting Environment Variables for more information on setting and changing environment variables.

The SKIPROWS statement is a database-description statement. It must follow the CREATE statement and precede any editing statements when you create a descriptor.

TYPE column-identifier-1 <=> C | N < . . . column-identifier-n <=> C | N >;
changes the expected data types of SAS variables. SAS data sets have two data types: character (C) and numeric (N). Spreadsheet files have the same two data types: character (for labels and formula strings) and numeric (for numbers and formulas). Changing the default data type of a SAS variable in a descriptor file also changes its associated default format in the loaded file.

If you omit the TYPE statement, the database field types are generated from the PC file data types. You can change as many database field types as you want in one TYPE statement.

WORKSHEET <=> <'>worksheet-name<'>;
identifies one worksheet from a group of worksheets while you are reading from an XLS file. The worksheet-name is a 31-character name and is not case-sensitive. For example, specifying WORKSHEET=SHEET2 identifies worksheet 2 from a group of worksheets

The WORKSHEET statement is optional. For Excel 4 files, there is only one worksheet identifier, WORKSHEET1; therefore, the WORKSHEET statement is ignored. Under Excel 5, the default value is SHEET1. If you change the default worksheet from within Excel, you can either supply the new worksheet name or supply the worksheet's value (such as Sheet5 ).

The WORKSHEET statement is a database-description statement. It must follow the CREATE statement and precede any editing statements when you create an access descriptor.


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.