The GMAP Procedure

Concepts

The GMAP procedure requires a map data set and a response data set. These two data sets must contain the required variables or the procedure stops with an error message. You can use the same data set as both the map data set and the response data set, as long as the requirements are met. If a different data set is used as the response data set, it must contain an ID variable that is identical to the ID variable in the map data set.

About Map Data Sets

A map data set is a SAS data set that contains coordinates that define the boundaries of map areas, such as states or counties. A map data set must contain at least these variables:

a numeric variable named X that contains the horizontal coordinates of the boundary points. The value of this variable could be either projected or unprojected. If unprojected, X represents longitude.
a numeric variable named Y that contains the vertical coordinates of the boundary points. The value of this variable could be either projected or unprojected. If unprojected, Y represents latitude.
one or more variables that uniquely identify the areas in the map. Map area identification variables can be either character or numeric and are indicated in the ID statement.

The X and Y variable values in the map data set do not have to be in any specific units because they are rescaled by the GMAP procedure based on the minimum and maximum values in the data set. The minimum X and Y values are in the lower-left corner of the map, and the maximum X and Y values are in the upper-right corner.

Map data sets in which the X and Y variables contain longitude and latitude should be projected before you use them with PROC GMAP. See The GPROJECT Procedure for details.

Optionally, the map data set also can contain a variable named SEGMENT to identify map areas that comprise noncontiguous polygons. Each unique value of the SEGMENT variable within a single map area defines a distinct polygon. If the SEGMENT variable is not present, each map area is drawn as a separate closed polygon that indicates a single segment.

The observations for each segment of a map area in the map data set must occur in the order in which the points are to be joined. The GMAP procedure forms map area outlines by connecting the boundary points of each segment in the order in which they appear in the data set, eventually joining the last point to the first point to complete the polygon.

Any variables in the map data set other than the ones mentioned above are ignored for the purpose of determining map boundaries.

About SAS/GRAPH Map Data Sets

In addition to the variables described in About Map Data Sets, the SAS/GRAPH map data sets may also contain the following variables:

a numeric variable named LONG containing the unprojected longitude in radians of the boundary points.
a numeric variable named LAT containing the unprojected latitude in radians of the boundary points.

The GMAP procedure uses the values of the X and Y variables to draw the map. Therefore, if you want to produce an unprojected map by using the values in LONG and LAT, you would have to rename LONG and LAT to X and Y first.

SAS/GRAPH includes a number of predefined map data sets. These data sets are described in SAS/GRAPH Map Data Sets.

Map Data Sets Containing X, Y, LONG, and LAT

Most Institute-supplied map data sets contain four coordinate variables (X, Y, LONG, and LAT). In this case, X and Y are always projected values that will be used by the GRAPH procedures (by default). If you need to use the unprojected values that are contained in the LONG and LAT variables, you will need to rename the LONG and LAT variables to X and Y since the GMAP procedure automatically uses X and Y. See Input Map Data Sets that Contain Both Projected and Unprojected Values for more details.

Map Data Sets Containing Only X and Y

The Institute-supplied map data sets that contain X and Y variables (and no LONG and LAT variables), are usually projected maps. However, there are a few map data sets for the US and Canada that contain X and Y values that are unprojected longitude and latitude. In this case, you will need to use the GPROJECT procedure to project the map (see The GPROJECT Procedure).

Note: You can determine whether a SAS map data set is projected or unprojected by looking at the description of each variable that is displayed when you use the CONTENTS procedure or by browsing the MAPS.METAMAPS data set. [cautionend]

Specialty Map Data Sets

There are several map data sets available with SAS/GRAPH that allow you to easily label maps:

MAPS.USCENTER: contains the X and Y coordinates of the visual center of each state in the U.S. and Washington, D.C., as well as points in the ocean for states that are too small to contain a label. You can use MAPS.USCENTER with the MAPS.US, MAPS.USCOUNTY, MAPS.COUNTIES, and MAPS.COUNTY data sets.
MAPS.USCITY: contains the X and Y coordinates of selected cities in the U.S. Many city names occur in more than one state, so you may have to subset by state to avoid duplication. You can use MAPS.USCITY with the MAPS.US, MAPS.USCOUNTY, MAPS.COUNTIES, and MAPS.COUNTY data sets.
MAPS.CANCENS: contains the names of the Canadian census divisions. You can use MAPS.CANCENS with the MAPS.CANADA and MAPS.CANADA3 data sets.

See the MAPS.METAMAPS data set for details on each of the Institute-supplied map data sets.

About Response Data Sets

A response data set is a SAS data set that contains

one or more response variables that contain data values that are associated with map areas. Each value of the response variable is associated with a map area in the map data set.
identification variables that identify the map area to which a response value belongs. These variables must be the same as those that are contained in the map data set.

The response data set can contain other variables in addition to these required variables.

The values of the map area identification variables in the response data set determine the map areas to be included on the map unless you use the ALL option in the PROC GMAP statement. That is, unless you use ALL in the PROC GMAP statement, only the map areas with response values are shown on the map. As a result, you do not need to subset your map data set if you are mapping only a small section of the map. However, if you map the same small section frequently, create a subset of the map data set for efficiency.

For choropleth, block, and prism maps, the response variables can be either character or numeric. For surface maps, the response variables must be numeric with only positive values.

About Response Variables

The GMAP procedure can produce block, choropleth, and prism maps for both numeric and character response variables. Numeric variables fall into two categories: discrete and continuous.

Discrete variables contain a finite number of specific numeric values that are to be represented on the map. For example, a variable that contains only the values 1989 or 1990 is a discrete variable.
Continuous variables contain a range of numeric values that are to be represented on the map. For example, a variable that contains any real value between 0 and 100 is a continuous variable.

Numeric response variables are always treated as continuous variables unless the DISCRETE option is used in the action statement.

About Response Levels

Response levels are the values that identify categories of data on the graph. The categories that are shown on the graph are based on the values of the response variable. Based on the type of the response variable, a response level can represent these values:

a specific character value. If the response variable is character type, the GMAP procedure treats each unique value of the variable as a response level. For example, if the response variable contains the names of ten regions, each region will be a response level, resulting in ten response levels.
The exception to this is that the MIDPOINTS= option chooses specific response level values. Any response variable values that do not match one of the specified response level values are ignored. For example, if the response variable contains the names of ten regions and you specify these midpoints, only the observations for Midwest, Northeast, and Northwest are included on the map:
```
midpoints='Midwest' 'Northeast' 'Northwest'
```

a range of numeric values. If the response variable is numeric, the GMAP procedure determines the number of response levels for the response variable. Each response level then represents the median of a range of values.
These options are exceptions to this:
- The LEVELS= option specifies the number of response levels to be used on the map.
- The DISCRETE option causes the numeric variable to be treated as a discrete variable.
- The MIDPOINTS= option chooses specific response level values as medians of the value ranges.
If the response variable values are continuous, the GMAP procedure assigns response level intervals automatically unless you specify otherwise. The response levels represent a range of values rather than a single value.
a specific numeric value. If the response variable is numeric and you use the DISCRETE option, the GMAP procedure treats the variable much the same way as it treats a character response variable. That is, the procedure creates a response level for each unique value of the response variable. If you use DISCRETE with a numeric response variable that has an associated format, each formatted value is represented by a different response level. Formatted values are truncated to 16 characters.

The BLOCK, CHORO, and PRISM statements assign patterns to response levels. In CHORO and PRISM maps, response levels are shown as map areas. However, in BLOCK maps, response levels are shown as blocks. The default fill pattern for the response level is solid.

PATTERN statements can define the fill patterns and colors for both blocks and map areas. PATTERN definitions that define valid block patterns are applied to the blocks (response levels), and PATTERN definitions that define valid map patterns are applied to map areas.

See PATTERN Statement for more information on fill pattern values and default pattern rotation.

About Identification Variables

Identification (ID) variables are common to both the map data set and the response data set. They identify the map areas (for example, counties, states, or provinces) that make up the map. A unit area or map area is a group of observations with the same ID value. The GMAP procedure matches the value of the response variables for each map area in the response data set to the corresponding map area in the map data set to create the output graphs.

Displaying Map Areas and Response Data

Whether the GMAP procedure draws a map area and whether it displays patterns for response values depends on the contents of the response data set and on the ALL and MISSING options. Displaying Map Areas and Response Data describes the conditions under which the procedure does or does not display map areas and response data.

Displaying Map Areas and Response Data
If the response data set... And if... Then the procedure...

includes the map area the map area has a response value draws the map area and displays the response data

includes the map area the map area has no response value (that is, the value is missing) draws the map area but leaves it empty

includes the map area the map area has no response value and the MISSING option is used in the map statement draws the map area and displays a response level for the missing value

does not include the map area the ALL option is used in the PROC GMAP statement draws the map area but leaves it empty

does not include the map area the ALL option is not used does not draw the map area

***Displaying Map Areas and Response Data***
If the response data set...	And if...	Then the procedure...
includes the map area	the map area has a response value	draws the map area and displays the response data
includes the map area	the map area has no response value (that is, the value is missing)	draws the map area but leaves it empty
includes the map area	the map area has no response value and the MISSING option is used in the map statement	draws the map area and displays a response level for the missing value
does not include the map area	the ALL option is used in the PROC GMAP statement	draws the map area but leaves it empty
does not include the map area	the ALL option is not used	does not draw the map area

Summary of Use

To use the GMAP procedure, you must do the following:

If necessary, issue a LIBNAME statement for the SAS data library that contains the map data set that you want to display.
Determine what processing needs to be done to the map data set before it is displayed. Use the GPROJECT, GREDUCE, and GREMOVE procedures or a DATA step to perform the necessary processing.
Issue a LIBNAME statement for the SAS data set that contains the response data set, or use a DATA step to create a response data set.
Use the PROC GMAP statement to identify the map and response data sets.
Use the ID statement to name the identification variable(s).
Use a BLOCK, CHORO, PRISM, or SURFACE statement to identify the response variable and generate the map.

Chapter Contents
Previous
Next
Top of Page