Chapter Contents

Previous

Next
Using Spatial Data with SAS/GIS Software

SAS Data Sets

A SAS data set is a collection of data values and their associated descriptive information that are arranged and presented in a form that can be recognized and processed by the SAS System. SAS data sets may be data files or views. A SAS data file contains the following elements:

A SAS View contains the following elements:

A third element of the SAS data set is one or more indexes. A SAS index contains the data values of the key variables that are paired with a location identifier for the observation that contains the variable. The value/identifier pairs are ordered in a B-tree structure that enables the engine to search by value.

SAS data sets can be indexed by one or more variables, known as key variables. SAS indexes are classified as simple or composite, according to the number of key variables whose values make up the index.


SAS/GIS Data Sets

As a component of the SAS System, SAS/GIS stores all of its data in SAS data sets. The data sets for a SAS/GIS spatial database work together as one logical entity, although they are physically separated into multiple data sets. SAS/GIS separates the data into the following data sets:

Chains data set
Contains coordinates for the polylines that are used to form line and polygon features. A polyline consists of either a single line segment of a series of connected line segments. A chain is a sequence of two or more points in the coordinate space. The end points, the first and last points of the chain, must be nodes. Each chain has a direction, from the first point toward the last point. The first point in the chain is the from-node, and the last point is the to-node. Relative to its direction, a chain has a left side and a right side. Points between the from-node and the to-node are detail points, which serve to trace the curvature of the feature that is represented by the chain. Detail points are not nodes.

The chains data set also lists the from-node and to-node row numbers in the nodes data set, as well as the number of detail points and the corresponding details data set row number. The left and right side attribute values, for example, ZIP codes and FIPS codes, are also stored in the chains data set.

Nodes data set
Contains the coordinates of the end points for the chains in the chains data set and the linkage information that is necessary to attach chains to the correct nodes. A node is a point in the spatial data with connections to one or more chains. Nodes can be discrete points or the end points of chains. A node definition may span multiple records in the nodes data set, so only the starting record number for a node is a node feature ID.

Details data set
Stores curvature points of a chain between the two end nodes, which are also called the from-node and the to-node. That is, the details data set contains all the coordinates between the intersection points of the chain. The node coordinates are not duplicated in the details data set. Details data sets also contain the chains data set row number of the associated chain.

Polygonal index data set
Contains one observation for each polygon that was successfully closed during the index creation process. It is called a polygonal index because each observation is literally an index to each polygon in the chains data set. That is, it points to the starting chain in the chains data set for each of the polygons.

Label data set
Defines the attributes of labels to be displayed on the map. The attributes include all of the information that is applicable for each label, such as location, color, size, source of the text for a text label, as well as other behavioral and graphical attributes.


Managing Data Set Sizes

By their nature, spatial databases tend to be rather large. Users of spatial data want as much detail in the maps as they can get, which increases the demands on storage and processing capacity. Spatial data that are not carefully managed can become too large for easy use.

Following are five actions that you can take to manage the size of your spatial data sets. You need to perform most of these actions before importing your data into SAS/GIS.

Reduce the spatial extent of the data.
Do not store a larger area than you need. If you need a map containing one state, do not store a map containing all the states for a region. For example, if you need to work with a map of Oregon, do not store a map containing all of the Pacific Northwest.

Store only the features that you need.
If you do not need features such as rivers and lakes, do not store these features in your spatial data.

Limit the amount of detail to what is necessary for your application.
If you are using a map for which you don't require highly detailed boundaries, reduce the detail level and save storage space. If you are using SAS/GRAPH data sets, you can use PROC GREDUCE to reduce the detail level. If you are using a data set from another source, you'll have to reduce the level of detail before importing the data set into SAS/GIS.

Reduce the number of attributes that are stored with the spatial data.
If you don't need an attribute, and don't think you will ever need it again, delete it from your spatial data.

Reduce the size of variables that are stored in the spatial data
Also, you might want to reduce the size of each variable, if possible. That is, examine the method that you use for storing your variables and determine if you can safely reduce the variable size that you use to store them.

For example, if you have a numeric variable that contains a code that can be a maximum of two digits, perhaps it would be better to store it in a two-digit character variable rather than in an eight-byte numeric variable. Change the variables' defined types or lengths in a DATA step after you complete the import.

Of the five actions, reducing the number of attributes is the easiest to perform. Use the Import window that you can access by selecting [Modify Composites] from the GIS Spatial Data Importing window to remove and drop unneeded composite variables from your data set as it is imported.


Data Set Variables

The following sections describe the variables that are specific to each SAS/GIS data set.

Import Type Specific Variables

The following tables describe the composites and variables that are created for each of the import types. All of the variables are located in the chains data set except for the X and Y variables, which are in the nodes data set.

Composites and Variables Specific to the ARC/INFO Import Type
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
ARCID ARCIDL ARCIDR

A or C ARCID from the ARC/INFO coverage. Maps made from line and point coverages will not have left and right variables.
ARCNUM



C ARCNUM from the coverage.
'COVERAGE' 'COVERAGE'_L 'COVERAGE'_R


A or C This variable is derived from the input filename. It's the last word preceding the file extension. For example, /local/gisdata/montana.e00 would have a 'COVERAGE' (table note 2) name of montana. The left variable would be montanal, the right variable would be montanar, and the composite type would be Area. Line and point coverages do not have left- and right-side variables, and the composite type would be Classification.
AREA AREAL AREAR

A AREA from the coverage.
PERIMETE PERIML PERIMR

A PERIMETER from the coverage.
'ATTRIB' 'ATTRIB'L 'ATTRIB'R


All variables in the polygon, line, or point attribute tables are saved as composite variables. In the case of the polygon coverages, an L or an R is added to the end of the first five characters of the actual variable name.
_COVER_ _COVEL _COVER

A or C This variable contains the name stored in the 'COVERAGE' variable.
_SRC_ _SRCL _SRCR

C Contains the string 'ARC'.
X X


X X coordinate.
Y Y


Y Y coordinate.


TABLE NOTE 1:  
A = Area
C = Classification
X = X coordinate
Y = Y coordinate
 [arrow]
TABLE NOTE 2:  

Names in single quotation marks, such as 'COVERAGE' and 'ATTTRIB,' are GIS composite names. [arrow]

Composites and Variables Specific to the Digital Line Graph (DLG) Import Type
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
LMAJOR(n) LMAJOR(n)


C Major line attribute code.
LMINOR(n) LMINOR(n)


C Minor line attribute code.
NMAJOR(n) NMAJOR(n)


C Major node attribute code.
NMINOR(n) NMINOR(n)


C Minor node attribute code.
MAJOR(n) AMAJORR(n) AMOJORL(n)

A Major area attribute code.
MINOR (n) AMINORL(n) AMINORR(n)

A Minor area attribute code.
X X


X X coordinate.
Y Y


Y Y coordinate.


TABLE NOTE 1:  
A = Area
C = Classification
X = X coordinate
Y = Y coordinate
 [arrow]

Composites and Variables Specific to the DXF Import Type
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
'ATTRIB' 'ATTRIB'L 'ATTRIB'R

A or C All polygon, line, or point attributes are saved as composite variables. In the case of polygon maps, an 'L' or 'R' is added to the end of the first seven characters of the actual variable name.


TABLE NOTE 1:  
A = Area
C = Classification
 [arrow]

Composites and Variables Specific to the Genline Import Type
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
ID ID


C The ID variable from the data set.
'ATTRIB' 'ATTRIB' 'ATTRIB'

C Any other variable in the data set is saved as a classification composite.
X X


X X coordinate.
Y Y


Y Y coordinate.


TABLE NOTE 1:  
C = Classification
X = X coordinate
Y = Y coordinate
 [arrow]

Composites and Variables Specific to the Genpoint Import Type
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
ID ID


C The ID variable from the data set.
'ATTRIB' 'ATTRIB' 'ATTRIB'

C Any other variable in the data set is saved as a classification composite.
X X


X X coordinate.
Y Y


Y Y coordinate.


TABLE NOTE 1:  
C = Classification
X = X coordinate
Y = Y coordinate
 [arrow]

Composites and Variables Specific to the MapInfo Import Type
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
'ATTRIB' 'ATTRIB'L 'ATTRIB'R

A or C All polygon, line, or point attributes are saved as composite variables. In the case of polygon maps, an 'L' or 'R' is added to the end of the first seven characters of the actual variable name.
LINELYR



C This variable is derived from the input filename. It's the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a LINELYR name of montana.
PTLYR



C This variable is derived from the input filename. It's the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a PTLYR name of montana.
POLYLYR



A This variable is derived from the input filename. It's the last word preceding the file extension. For example, /local/gisdata/montana.mif would have a POLYLYR name of montana.
'MAP' 'MAP'


A or C This variable is derived from the input filename. It's the last word preceding the file extension. For example, /local/gisdata/usa.mif, would have a 'MAP' name of usa. The left variable would be usal, and the right variable would be usar and, in this case, the composite type would be Area. Line and point maps do not have left- and right-side variables, and the composite would be Classification.


TABLE NOTE 1:  
A = Area
C = Classification
 [arrow]

Composites and Variables Specific to the SAS/GRAPH and Genpoly Import Types
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
'IDVAR'n 'IDVAR'L 'IDVAR'R

A An area composite variable is created for each ID variable (IDVAR) selected by the user in the ID vars list box. In the case of polygon maps, an 'L' or 'R' is added to the end of the first seven characters of the actual variable name.


TABLE NOTE 1:  
A = Area
 [arrow]

Composites and Variables Specific to the TIGER and DYNAMAP Import Types
Composite Variable 1 Variable 2 Variable 3 Variable 4 Type (table note 1) Description
ADDR FRADDL FRADDR TOADDL TOADDR ADDR Address range.
BLOCK BLOCKL BLOCKR

A Block number.
CFCC CFCC


C Feature classification code.
COUNTY COUNTYL COUNTYR

A County FIPS code.
DIRPRE DIRPRE


ADDRP Feature direction prefix.
DIRSUF DIRSUF


ADDRS Feature direction suffix.
FEANAME FEANAME


C Feature name.
MCD MCDL MCDR

A Minor civil division.
PLACE PLACEL PLACER

A Incorporated place code.
RECTYPE RECTYPE


C Record type.
STATE STATEL STATER

A State FIPS code.
TRACT TRACTL TRACTR

A Census tract.
ZIP ZIPL ZIPR

A ZIP code.
BG BGL BGR

A Block group.
LONGITUDE X


X Longitude.
LATITUDE Y


Y Latitude.


TABLE NOTE 1:  
A = Area
C = Classification
ADDR = Address
ADDRP = Address Prefix
ADDRS = Address Suffix
X = Longitude
Y = Latitude
 [arrow]


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.