Problems with this project, and a discussion of issues around Data Integration

Background

My intent upon starting this project was to create a Multi-Criteria Evaluation analysis to determine the feasibility of building a road linking the Sunshine Coast to the Lower Mainland, and to determine the most logical path for the road based on specified constraints and factors.

The area of interest for my project required me to merge together five separate topographic maps. These were digital versions of the Canadian Topographic Map series delivered in an ArcView Shapefile format.

data merged to show area of interest

The five topographic maps merged together with only the contour layer active.

In order to do the MCE analysis for my project I needed to develop individual maps identifying the constraints and factors. There was no problem with the constraints. Boolean maps were created for each constraint and were then overlayed using a logical 'OR' operation in order to determine the possible areas where a road could be located. (Click here for information regarding the creation of the Boolean maps). My problem came when creating the factor maps necessary to apply a weighted linear combination (WLC).

The contour layers proved to be problematic to manipulate in order to be used in a format that IDRISI could read. When importing just one of the contour layers (before merging the five contour layers together) into IDRISI and then converting it to a raster file I discovered that the contour measurements were not maintained, instead the contours were identified by the ID number. This was useless for the purposes of my project.

Many methods of data manipulation were attempted. First the contour layers were merged within ArcMap, but I was unable to dissolve the resulting layer as it was too big. Using the 3D analyst I converted the new contour layer into a TIN (triangular irregular network) format, and then converted the TIN to a raster format. The data did not look correct, there seemed to be a distinct difference between the boundaries of each map, they did not appear to have merged correctly. After consultation with Jasper (SFU's resident SIS expert) I decided to try creating a TIN and then converting this to a raster for each individual map before merging them together using ArcInfo's GRID module. Again the result was a map with distinct differences at the boundary of each individual map.

DEM of area of interest

The raster map of the area of interest, showing clearly that there are problems between the maps that had been merged.

After countless hours (over a period of a couple of weeks) of manipulating and experimenting with the contour data in numerous ways, I discovered that when I zoomed into the boundary areas between maps, that not only were the contours lines not meeting up in many places, but that the contour measurements between the two different maps were vastly different. What I had assumed was a data manipulation problem was in fact a problem with the data itself.

contours lines not meeting up

A small section of the border between map 092G11 and 092G14, zoomed in to see clearly the contours not meeting. In addition, most of the contours that do meet do not have the same contour measurement.

The metadata file that came with the Shapefiles had no information about the projection used; in fact there was very little information at all. There were certainly no clues as to why the contour lines did not match. Eventually my TA (Rob Feidler, thank you!) realised that the difference in measurements between maps was consistent, always approximately a factor of 3. To convert between metres and feet a factor of 3.28 is used, perhaps this was relevant? Rob spent time researching on the internet and eventually found metadata for the Canadian Topographic Map series at the Natural Resources Canada website . (Click on "NTS Number Query" then type in "092G")

The online metadata showed that the maps had different datums (NAD83, NAD27) and contour intervals (100ft, 20m, 40m), and that some maps were created in feet and some in metres. It was also discovered that these digital maps had been created by scanning and digitizing the original hard copies; a method well known for being notoriously inaccurate.

I converted the maps all to feet and to the same datums. The resulting raster file looked right with no discernable differences between maps, but upon endeavouring to create a slope image within IDRISI the error message stated that there was a problem with the data. Upon close observation of the boundary area again, it was noted that STILL some of the contour lines did not match. Essentially, I found after countless hours of trial and error that my data was pretty well useless!

the map after converting

After converting the data to consistent datums and measurements the resulting map looks better but still IDRISI would not build a slope from this data. Note how the TIN process created additional data in the NW corner.

Data Integration Issues

"Maps of the same theme for the same area will not be the same"

The above quote by Michael Goodchild in his presentation "Models for Uncertainty in Area-Class Maps" summarizes his belief that because of numerous issues such as different levels of generalization, variation among observers, distortion, and different classifiers a map is never accurate. There are always inadequacies with data. As Goodchild goes on to say in his presentation "if there is known variation, the results of a single analysis cannot be claimed to be accurate". Of course Goodchild is talking from the perspective of an analysis that has been assumed to have 'worked'. However, the disappointing results (or lack of results) with my project only serves to reinforce Goodchild's belief. If it had not been necessary to use the contour layers in my project I may never have discovered just how inaccurate or inappropriate my data was for making a serious analysis of the area.

Metadata: Any GIS analysis project is only as "persuasive as the data that underlies it" (Schuurman 1). It is crucial to have an understanding of the quality of the data you are using for your project. This knowledge can only be gained by having access to good metadata. Metadata (data about the data), answers questions about origins, quality and applicability of data. Metadata should answer questions such as "what quality measures were taken; how was the data classified; and what mapping units are associated with the data" (Schuurman 11). Information such as this would have saved me countless wasted hours and grief.

Standardization and Classification:"The process through which disparate terms for similar - but rarely identical- entities or attributes are homogenized in order to use multiple sources of data" (Schuurman 16). It is clear from this definition that there was no standardization process in place when converting the Canadian Topographic Map Series into digital form. It would appear that no thought was made as to the possible complications for a GIS user when combining maps. Undoubtedly, there would have been numerous technicians employed to scan and digitize the maps. Each technician would have had different levels of expertise with the digitizing process and would have interpreted the data based on their unique individual knowledge base and perceptions.

Going back a step further, the data collection process itself, when originally creating the hardcopy topographic maps, would have again involved numerous people; cartographers and surveyors with differing levels of experience. With no specific standardization or classification process in place, each person would have created maps based on their own understanding of what was necessary and their own knowledge base. An example of how troublesome this is follows: In the guidelines for creating contour lines a cartographer has a choice depending on the terrain or scale of the map as to what contour interval may be used. For hilly terrain at 1/50,000 the guideline suggests an interval of 20m or 100ft. It is left to the cartographer to determine the definition of 'hilly'. What may be classified as hilly by one person may be classified as flat or even mountainous by another depending on their perspective. In addition, there is no crisp boundary between contour levels; depending on the number of physical readings taken, many of the contour levels are approximated or interpolated between readings. There are countless ways in which the data from different sources could have been classified in a nonconsistent manner.

Until such a time as data from multiple sources are standardized in a realistic manner, GIS analysis can not be taken as 100% reliable. It is crucial for GIS users to take this lack of standardization into account when predicting the accuracy of their work.

GIS experts are well aware of the need for standardization and in conjunction with International Standards Organizations are working towards making a viable system that is agreed upon as the industry standard. Papers such as Nadine Schuurman's "Flexible Standardization: Making Interoperability Accessible to Agencies with Limited Resources" offer an alternative to aid with the problem of standardization by presenting an application that could help to standardize simple spatial entities. Michael Goodchild's paper "Finding the Mainstream" presents an interesting discussion on the lack of metadata standards and the impact the growth in popularity of GIS has on the GIS community as a whole.