Metadata Delivery

 
 

Digital data is easy to get, but it is frequently meaningless as the semantic terms that describe the data are understood differently in different contexts. The portability of data is not matched by its availability simply because meaning often travels poorly. The word urban, for instance, means something very different in New Brunswick than it does in British Columbia. In order for data to mean something to a new, non-local audience it must have context or metadata. Traditionally metadata associated with spatial data has been very perfunctory. It has stressed important aspects of the data such as projection, scale and lineage, but has lacked context for the use of terms.

I am developing a system for delivering and analyzing metadata for population health data. This tool kit uses OWL (Ontology Web Language) to communicate ontological differences among comparable data sets. I have been funded by the Canadian Institutes of Health Research to conduct a three year research project entitled:

Creation of an extended metadata format for public and population health research, policy and surveillance in Canada

An abstract for the project is below:

To realize the potential of enhanced information value arising from the ability to integrate data sources, three inter-related operational challenges must be addressed. First, a database ethnography that systematically captures information about the origin, original intended use and creators of population health data is required. Second, issues of standardization and integration of both attribute and spatial data between multi-faceted data sets that span multiple jurisdictions must be confronted. Third, presentation, interface, and access of linked data sets presents a distinct but related set of challenges. The purpose of the proposed work is to make significant headway in developing strategies for meeting these three operational challenges. The project will focus on three concrete goals: (i) development of a pilot project to record extended metadata ethnographies for selected major datasets used by population health researchers and public health workers; (ii) development of proof of concept health data standardization techniques (content tags) based on web-accessible extended metadata; and (iii) creation of a prototype for a web-based infrastructure for maintaining on-line access to extend metadata for researchers in Canada.