The data chosen for this project
is from 1996 census data on average annual household income of Vancouver,
BC. The census subdivision unit selected is Enumeration Areas (EAs)
since enumeration areas are subdivided into the smallest unit which shows
more local variation of the attribute. The map is vector polygon
coverage and is then converted into raster grid. Another reason of
using enumeration areas as opposed to other subdivision unit (say census
tracts) is because it contains more polygons within an area. However
there is one problem in using the enumeration areas: there are polygons
having missing data. These missing data polygons are either truly
having no population data in it (e.g. a school) or are new enumeration
units added since the last census. Regardless of which reason, the
portion of Vancouver is chosen so that the area does not contain any polygon
having missing data. This results to an area of 2500m by 2500m in
the southeast corner of Vancouver. (This map is shown on the map
page). A square area is used because the grid cells of the raster
data model are also square. 44 enumeration areas are included within
this selected area.
For the measurement of spatial autocorrelation, the most commonly
used statistics are the Moran’s I coefficient and the Geary’s c coefficient.
The equations for Moran’s I coefficient and Geary’s c coefficient are:
Where xi and xj are the variable
under concern at location i and j respectively, x-bar is the mean of the
variable over the area under concern, and wij is the weighting function.
Both Moran and Geary coefficients measure the same thing, with the
expectation of I is
–1/(n – 1) and c is 1 when there is no spatial autocorrelation exists
over the data. For positive autocorrelation where data are clustered
together, I will approach 1 and c will be close to 0. Negative autocorrelation
will have I approaching –1 and c further from 1. Basically they do
not differ a lot in terms of determining autocorrelation of spatial data,
but Cliff and Ord (1973, 10) has pointed out that “the variance of I is
less affected by the distribution of the sample data than is the differences
squared form used in Geary’s c” and Upton and Fingleton (1985, 170) demonstrate
that “[Geary’s] c is more sensitive to the absolute differences between
pairs of values…” as shown by the different forms of equation between the
two. Therefore Geary statistics is much more affected by the nature
of the data and as Cliff and Ord (1981) has proved that Moran’s I is more
powerful that Geary’s c. Regardless of which statistic is better,
both of them are going to be used in this project, with the emphasis on
the description of results from Moran’s I coefficient.
Software used for this project includes
ArcView, ARC/INFO, SPSS and S-Plus. In ArcView the shapefile coverage
of Vancouver is zoomed in to the selected area of 2500m by 2500m.
The command of ‘convert to grid’ function is used to convert the polygons
of EAs from vector model into raster model as grid coverage. Different
cell sizes are specified so that the resulting grids are n rows by n columns,
where n is an integer. (For those grids that do not have n as integer
and that the resulting grids are less than 2500m by 2500m, they are not
included in the analysis). When converting into raster grid, cell
values are assigned according to the centre value of the attribute of the
original coverage when more than one polygon are included within a cell.
The grid coverages are then exported from ArcView and a conversion from
grid to ASCII function is used in ARC/INFO ArcToolbox. The grids
are converted into ASCII format so that they can be imported into S-Plus
for the calculation of Moran and Geary statistics. In S-Plus spatial
neighbour structure is first defined on each grid coverage using the spatial
module. Spatial neighbours are created such that the grid is defined
as to the number of rows and columns of the original grid and first order
neighbour type is chosen so that the immediate neighbours of direct connection
of borders with the grid cell is determined. Also, the weighting
function is set to be equal (equals to 1) all across the study area so
that every single cell receive the same weight, or in other words all neighbours
are having the same influence on the cell value. The spatial correlation
function is then used to calculate Moran and Geary statistics based on
the spatial neighbour structure defined and non-free sampling. Non-free
sampling, according to Cliff and Ord (1981), is sampling without replacement
and that each cell has the same probability.
GO TO Next Topic -->
<-- Back to Index