Advanced Issue with Metadata in Spatial Information Systems

Acknowledgements

References

GIS at SFU

4.2 Problems and Errors

Probably the most significant problem that we encountered in this project was referring to the ISO tags in the XML files with metadata explorer. Metadata explorer draws upon the information stored in the XML files using Java Server Page files (JSP).

It is easy, in theory, to modify these files to produce a simplified metadata scheme. Some of these simplified products could be one sheet showing “basic” metadata, another showing “details” one showing only technical metadata (geographic bounds, projection, datum, etc), and so on. As the XML, files contain both FGDC and ISO metadata tags; a separate metadata wizard exists for both metadata schemes. Despite the fact that Metadata Explorer appears to support ISO metadata, Metadata Explorer is not easily configurable for use with ISO metadata. In fact, when we attempted to modify the JSP file “…/Include_details_output.JSP,” we encountered a severe problem regarding the structure of the ISO tags in the XML files. The problem lies in that there are many duplicate XML tags, many with several children elements and tags, I.E. several tags with the same name. ArcIMS version 4.0 has bugs in it that do now allow it to recognize higher-level Xpath statements, Xpath being the path language used to declaring XML tag locations. This causes due confusion within the metadata sheet generation process, for there is no way to tell the program which tag to draw data from.

The ISO metadata structure is not devoid of an intended methodology to sort the different tags. In every case of duplicate XML tags, there is a child tag with a “value” attribute. Each tag has a different value assigned to “value.” This intended use of this attribute is as a unique identifier for each tag such that tags can have the same name, but referred to as a different class of the same tag type. An example of this would be a set of tags named keywords: there may be thematic keywords, place keywords, context keywords, and others.” This naming convention causes problems with Xpath statements. In order to refer to the appropriate XML tags in the JSP file, Xpath statements need to be written for each element. Xpath statements are very easy to write: it is not difficult to write an expression that selects a tag from a group of tags with the same name, provided that there is a unique identifier embedded within the tag’s data. Figure 4-2-1 shows the principle problem, and figure 4-2-2 shows a simple selection based on the value of attribute “ID.”

<AAA>

   <BBB>

     <CCC ID = 1>

        <DDD>####</DDD>

     </CCC>

     <CCC ID = 2>

        <DDD>####</DDD>

     </CCC>

   </BBB>

</AAA>

Figure 4-2-1. The selection upon the use of the string /AAA/BBB/CCC.

<AAA>

   <BBB>

     <CCC ID = 1>

        <DDD>####</DDD>

     </CCC>

     <CCC ID = 2>

        <DDD>####</DDD>

     </CCC>

   </BBB>

</AAA>

Figure 4-2-2. The selection upon the use of the string /AAA/BBB/CCC[@ID = 1]/DDD.

Metadata Explorer does not understand an Xpath statement like this one. Xpath Visualizer can be found at: http://www.vbxml.com/xpathvisualizer/default.asp

Regarding the current structure of ISO metadata in XML is that the design of the tag system should be re-designed. The presence of multiple XML tags with the same name is not a good solution. The use of ID variables within in the tags is a source of much error and grief when making specific references in the file. X-path statements are very easy to write; but unfortunately, not easy for some software packages to understand. Software problems are another issue, yes; however, we must state that such a system should have a tag-hierarchy structure such that there are cases where duplicate tags exist. The exception to this is that duplicate tag names can be used if they are located within parent tags that have different names. The net result should be that every ISO metatdata element in the XML file has a unique X-path statement referring to it. This will ensure that the standard will be implemented without multiple tag reference issues, such as those that we experienced.

Other Problems and Issues

We encountered numerous problems throughout our project. The first one is the quality of the metadata: Many of the datasets from the SIS server and Research Data Library do not have high quality metadata; therefore it must be researched and created. This affects the importance of our datasets as well, for we are filling in the blanks for the required elements with no certainty that what we are writing is truly correct. The quality of the metadata plays an important role since the user utilizes this information, and if the metadata displays incorrect information about the datasets that they are using, this affects the quality of their research. Even though a large number of spatial datasets are available to SFU students and researchers, the information available on how to obtain or access these datasets is not always readily available. Especially, on the HTML interface, which is the most common, the exchange of data among academic department information system is problematic due to the insensitivity of the context and tags.

Our approaches to solve the problems are various and one of our initial trials was to establish a metadata schema conforming to the FGDC metadata standard. FGDC standard is chosen because it is a worldwide leading standard and it strongly recommended for the future open GIS environment at SFU. Standardization occurs because the people or organizations that provide the data to users should have the same format of metadata so that there is no confusion between different datasets. If all providers comply with the same format of metadata, it would be much easier for the users to understand the data and put it to proper use. Moreover, it is a solution to interoperability issues. One of the focuses of our project is to attempt to minimize the gap of diverse datasets by setting standards in hopes of achieving interoperability between providers and user in the campus community. This will allow users to use the diverse collection of data for research use without having to question if the datasets are compatible. On the other hand, we should be realistic and understand that this problem may never be fully resolved because of semantic and ontological problems relating to this topic. Since our group consists of only four people, we cannot think exactly like the other tens of thousands of student and faculties think; we can only assume.

This leads to problems implementing semantics and ontologies in a rigid programming environment. For example, the keywords used in our data dictionary may not comply with the other users because we may chose to use one of many choices of words which has the same meaning as a different word; i.e.: car and vehicle, or a single word with many meanings, such as “range.” In addition, semantics is interrelated to ontological problems. Because we are working with a collection of diverse datasets, they have a different ontological schema and we are incorporating their schemas into our project. Therefore, we are assuming that the users whom would be using this service share the same ontological schema, which is usually not the case. This is more evident in certain term used by different departments within SFU. Ontologies are locally unique, and therefore may never be solved but hopefully we can shrink the gap by standardizing the metadata.

<<- Previous l Top l Next ->>