|
Digital data is easy to get, but it is frequently meaningless
as the semantic terms that describe the data are understood
differently in different contexts. The portability of data
is not matched by its availability simply because meaning
often travels poorly. The word urban, for instance, means
something very different in New Brunswick than it does in
British Columbia. In order for data to mean something to a
new, non-local audience it must have context or metadata.
Traditionally metadata associated with spatial data has been
very perfunctory. It has stressed important aspects of the
data such as projection, scale and lineage, but has lacked
context for the use of terms.
I am developing a system for delivering and analyzing metadata for population health data. This tool kit uses OWL (Ontology Web Language) to communicate ontological differences among comparable data sets. I have been funded by the Canadian Institutes of Health Research to conduct a three year research project entitled:
Creation of an extended metadata format for public and population
health research, policy and surveillance in Canada
An abstract for the project is below:
To realize the potential of enhanced information value arising
from the ability to integrate data sources, three inter-related
operational challenges must be addressed. First, a database
ethnography that systematically captures information about
the origin, original intended use and creators of population
health data is required. Second, issues of standardization
and integration of both attribute and spatial data between
multi-faceted data sets that span multiple jurisdictions must
be confronted. Third, presentation, interface, and access
of linked data sets presents a distinct but related set of
challenges. The purpose of the proposed work is to make significant
headway in developing strategies for meeting these three operational
challenges. The project will focus on three concrete goals:
(i) development of a pilot project to record extended metadata
ethnographies for selected major datasets used by population
health researchers and public health workers; (ii) development
of proof of concept health data standardization techniques
(content tags) based on web-accessible extended metadata;
and (iii) creation of a prototype for a web-based infrastructure
for maintaining on-line access to extend metadata for researchers
in Canada.
|