SAS/SPECTRAVIEW Software User's Guide

Resolving Data Loading Problems

The following topics provide suggestions on how to resolve possible data loading problems. Note that if a data set fails to load, the software displays an error message in the text window.

Loading a Data Set with Only Three Variables

SAS/SPECTRAVIEW requires four variables in order to load a SAS data set. However, with the following procedure, it is possible to load a data set that has only three variables.

Create a temporary SAS data set with the following DATA step code:

data temp;
   set yourdatasetname;
   dummy=1;
   output;
   dummp =2;
   output;
run;

Load the temporary data set TEMP into SAS/SPECTRAVIEW.
Select the X and Y axis variables that you are interested in, then select DUMMY as the Z variable.
Select the Response variable that you want.
Select [Read data].
Use the data for your analysis.

Note that the Z plane will have two identical planes (z=1 and z=2). You can ignore the second one.

Changing Axis Variables

Sometimes data will load with certain axes and response variables specified but will not with different ones due to memory constraints. You want to specify variables that are the best ones as the axis variables to build as complete a volume grid with actual data points as possible. That is, you want to avoid specifying axis variables that are sparsely valued or have continuous data.

For example, the sample data set MORTGAGE loads without problems if YEARS, RATE, and AMOUNT are specified as the axis variables. However, if you specify PAYMENT for an axis and either YEARS, RATE, or AMOUNT as the response variable, the data may not load, because there are 16,400 unique values for PAYMENT. Note that if a data set fails to load, the error message in the text window specifies the number of unique values found for each axis.

See Specifying SAS/SPECTRAVIEW Variables for details on specifying variables and determining which variables are best.

Categorizing Data

One of the main reasons that a data set will not load is that the data does not represent a complete grid, which most often occurs with random data or if the axis values are continuous rather than discrete. The data set may fail to load due to memory constraints, even when a larger data set loaded successfully. The problem is the number of resulting data points in the volume grid, not the number of observations.

Memory requirements for a data set depend on the number of unique X, Y, and Z values, which determines the number of data points that are created. If the number of data points becomes large, the data set may fail to load without additional memory. Of course, it takes thousands and thousands of data points to cause data loading problems.

To make the data clearer and easier to use in SAS/SPECTRAVIEW, you can categorize the data, which groups numeric data to create distinct ranges (called categories) for each axis. Instructions on how to categorize data are in Categorizing Data.

Changing Duplicate Values Handling

Specifying how the software handles duplicate values can cause data not to load. For example, if you select either [Count] or [Nmiss] under the label Duplicate Values and the data you want to load comprises a complete grid having no missing x,y,z locations and no duplicate observations for the same x,y,z location, the data would fail to load. That is,

With [Count] specified, the response value for every data point would be 1. The data would fail to load because [Count] requires at least two different response values for an x,y,z location.
With [Nmiss] specified, the response value for every data point would be 0. The data would fail to load because [Nmiss] requires at least two different response values for an x,y,z location.

Instructions for specifying how the software handles duplicate values are in Handling Duplicate Values.

Removing BY Variable Specification

Removing the BY variable specification will cut the amount of storage required by the number of BY groups in the data set.

To calculate storage requirements for a BY variable, multiply the number of unique values for each axis variable by the number of BY groups. For example, if you have five BY groups, you would need five times as much storage, because a grid is created for each value of the BY variable.

More information on BY variable processing is in Grouping Observations with a BY Variable.

Using G4GRID Procedure to Create a Complete Grid

You can run the G4GRID procedure on data to create a data set that represents a complete grid. For example, if your data is random in nature, PROC G4GRID may be a good choice. The procedure produces data that is derived from the original data. The amount of time it takes to produce the new data set is based on the number of observations in the data set and the size of the requested output grid.

PROC G4GRID enables the loading of a data set that could not otherwise be loaded due to memory constraints. By using PROC G4GRID, you can fill in missing values with interpolated values or resize the data set as required. PROC G4GRID is useful when

the response values were sampled at discrete locations, for example, measurements of air pollution.
the response data is functionally related to the axis variables. That is, the response is either analytically or physically a function of the axis variables. Air pollution measurements are a function of discrete locations identified by axis values, but a stock's price is not a function of a stock's name. That is, just because Granny's Kitchen stock price is high does not mean Gerry's Garage stock price is high even though they fall next to each other in the grid. Smoothing with PROC G4GRID would lower Granny's stock and raise Gerry's stock because they would be assumed to influence each other.
you want a complete grid of values and can accept some changes from your original values.

Complete documentation for PROC G4GRID is in Appendix 1, "The G4GRID Procedure."

Calculating Volume Grid Storage Requirements

To understand how to calculate storage requirements, compare the following two DATA step examples.

The first example produces 9,261 observations and would load with no problems. In fact, it is a relatively small data set by SAS/SPECTRAVIEW standards. There are 21 unique values for each axis, which results in a grid that has 9,261 data points (21x21x21). Each data point requires approximately four bytes of storage on most machines. Therefore, it requires 4x9,261=~36KB of storage for the grid.

data load;
  drop a b c;
  a=0.3;
  b=0.2;
  c=0.1;
  do x = -1 to 1 by 0.1;
    do y = -1 to 1 by 0.1;
       do z = -1 to 1 by 0.1;
       response = x**2/a**2 + y**2/b**2 + z**2/c**2;
       output;
       end;
    end;
  end;
run;

The second example, however, may not load, even though it has only 100 observations. The number of unique X, Y, and Z values is unknown, but by using the RANUNI function, it can be assumed that it will be close to 100 for each variable. The grid, therefore, requires 100x100x100=1,000,000 data points or about 108 times (~4MB) the storage requirement as compared to the first example.

data noload;
   drop seed I a b c;
   seed = -1;
   a = 0.3;
   b = 0.2;
   c = 0.1;
      do I = 1 to 100;
         x = 2.0*ranuni(seed) - 1.0;
         y = 2.0*ranuni(seed) - 1.0;
         z = 2.0*ranuni(seed) - 1.0;
         response = x**2/a**2 + y**2/b**2 + z**2/c**2;
         output;
      end;
run;

Specifying Larger Memory Size

To specify a larger memory size, invoke the SAS System and specify the system option MEMSIZE, which controls how much memory the SAS System uses, with a larger memory size. For example,

-memsize 100m.

Note that SAS/SPECTRAVIEW also requires additional memory for overhead, some of which is proportional to the size of the data set. It is possible that, while there is enough memory to build the grid, some other area may not succeed, which will prevent the SAS data set from loading.

Chapter Contents
Previous
Next
Top of Page