Data and Method

Data and Methods

The data chosen for this project is from 1996 census data on average annual household income of Vancouver, BC. The census subdivision unit selected is Enumeration Areas (EAs) since enumeration areas are subdivided into the smallest unit which shows more local variation of the attribute. The map is vector polygon coverage and is then converted into raster grid. Another reason of using enumeration areas as opposed to other subdivision unit (say census tracts) is because it contains more polygons within an area. However there is one problem in using the enumeration areas: there are polygons having missing data. These missing data polygons are either truly having no population data in it (e.g. a school) or are new enumeration units added since the last census. Regardless of which reason, the portion of Vancouver is chosen so that the area does not contain any polygon having missing data. This results to an area of 2500m by 2500m in the southeast corner of Vancouver. (This map is shown on the map page). A square area is used because the grid cells of the raster data model are also square. 44 enumeration areas are included within this selected area.
For the measurement of spatial autocorrelation, the most commonly used statistics are the Moran’s I coefficient and the Geary’s c coefficient. The equations for Moran’s I coefficient and Geary’s c coefficient are:

Where xi and xj are the variable under concern at location i and j respectively, x-bar is the mean of the variable over the area under concern, and wij is the weighting function.
Both Moran and Geary coefficients measure the same thing, with the expectation of I is
–1/(n – 1) and c is 1 when there is no spatial autocorrelation exists over the data. For positive autocorrelation where data are clustered together, I will approach 1 and c will be close to 0. Negative autocorrelation will have I approaching –1 and c further from 1. Basically they do not differ a lot in terms of determining autocorrelation of spatial data, but Cliff and Ord (1973, 10) has pointed out that “the variance of I is less affected by the distribution of the sample data than is the differences squared form used in Geary’s c” and Upton and Fingleton (1985, 170) demonstrate that “[Geary’s] c is more sensitive to the absolute differences between pairs of values…” as shown by the different forms of equation between the two. Therefore Geary statistics is much more affected by the nature of the data and as Cliff and Ord (1981) has proved that Moran’s I is more powerful that Geary’s c. Regardless of which statistic is better, both of them are going to be used in this project, with the emphasis on the description of results from Moran’s I coefficient.

Software used for this project includes ArcView, ARC/INFO, SPSS and S-Plus. In ArcView the shapefile coverage of Vancouver is zoomed in to the selected area of 2500m by 2500m. The command of ‘convert to grid’ function is used to convert the polygons of EAs from vector model into raster model as grid coverage. Different cell sizes are specified so that the resulting grids are n rows by n columns, where n is an integer. (For those grids that do not have n as integer and that the resulting grids are less than 2500m by 2500m, they are not included in the analysis). When converting into raster grid, cell values are assigned according to the centre value of the attribute of the original coverage when more than one polygon are included within a cell. The grid coverages are then exported from ArcView and a conversion from grid to ASCII function is used in ARC/INFO ArcToolbox. The grids are converted into ASCII format so that they can be imported into S-Plus for the calculation of Moran and Geary statistics. In S-Plus spatial neighbour structure is first defined on each grid coverage using the spatial module. Spatial neighbours are created such that the grid is defined as to the number of rows and columns of the original grid and first order neighbour type is chosen so that the immediate neighbours of direct connection of borders with the grid cell is determined. Also, the weighting function is set to be equal (equals to 1) all across the study area so that every single cell receive the same weight, or in other words all neighbours are having the same influence on the cell value. The spatial correlation function is then used to calculate Moran and Geary statistics based on the spatial neighbour structure defined and non-free sampling. Non-free sampling, according to Cliff and Ord (1981), is sampling without replacement and that each cell has the same probability.

GO TO Next Topic -->

<-- Back to Index