5.0 Future
Engine
To create the semantic
functionality, there are several tasks to perform. Firstly,
metadata elements need to be identified that are both included
in the search engine, and are subject to frequent contextual
errors. The first choice from the metadata elements, being
either FGDC or ISO, would be the keywords because the same
keywords may be included for several different datasets. Some
examples of keywords that would have serious contextual
problems would be roads, transportation, range, and habitat:
essentially any variety of keyword entry that can be used in
many different contexts. For each keyword entry, a set of
secondary
“semantic”
tags could be created, giving each word context identifiers.
E.g. A GIS coverage concerning forest service roads would
likely have the keyword
“roads.”
The associated secondary semantic tags would include words
like logging, dirt, unpaved, rough, back roads, etc. These
secondary tags would be cross-referenced with the appropriate
input from the user’s
search parameters. Of course, this leads to the trickiest
part of developing a semantic search engine: How to make it
such that the user interface accepts enough detail into the
search parameters, but at the same time, not confuse the user
(as per the paradox presented earlier).
There is nothing to the
interface of basic web search engines: simply type a word in the
textbox, click SEARCH, and suddenly there are about 100,000
items found, maybe 10 or 20 of them are actually useful. For a
semantic search engine, the user interface cannot be this basic,
in fact, the search needs multiple input values in order to word
properly, otherwise, the system would not know how to utilize
the secondary and tertiary tags on the metadata elements. To
address this, the user interface could be designed to accept
multiple keywords as separate entities in the search, and
include contextual parameters, such as activity types. E.g. a
user searching for a coverage depicting rivers may be interested
in hydrological information, such as flow rate. The user could
specify one or more of the following as keywords: water, rivers,
streams, hydrology, flow, etc. Additionally, or alternately, the
user could choose
“scientific data”
from a list of activities or study fields from a drop box.
Whatever the user chooses would be searched for among the
keywords, AND cross-referenced with the secondary tags.
While there are limitless advantages to using
semantic search engines, especially with respect to spatial data
dictionaries, there is one major constraint on their
practicality: The amount of time and effort required by an
administrator to add or modify entries. Because what it is
proposing as an XML file structure with three classes of tags,
the amount of time that would be required to populate all of
those tags with values may discourage the implementation of this
system. Even with a small number of primary metadata elements,
the system could potentially need hundreds of secondary and
tertiary tags to work successfully. One solution to this
problem would be to pre-define contextual definitions to a
library of common geographical terms. A comprehensive set of
secondary and tertiary tags could be assigned to words like
road, field, lake, etc. During metadata entry, the user would
be able to type in a word, and the interface would inform the
user whether or not a set of pre-defined semantic and
ontological tags exist for that word. This would not address
every possible word, but it would address the problem in
sufficient detail.
|