When data are aggregated to boundaries such
a census tracts and block
groups, it is possible that the result of any analysis is determined to
some extent by the shape of the boundaries used. Aggregate data to
census tracts and you will get a different outcome to your analysis
than if you had aggregated the data to enumeration areas.
The Modifiable Area Unit Problem
(MAUP) is a potential source of error
that can affect spatial studies which utilize aggregate data sources
(Unwin, 1996). Geographical data is often aggregated in order to
present the results of a study in a more useful context, and spatial
objects such as enumeration areas or census tract boundaries are
examples of the type of aggregating zones used to show results of some
spatial phenomena. These zones are often arbitrary in nature and
different area units can be just as meaningful in displaying the same
base level data. For example, it could be argued that enumeration areas
containing comparable numbers of houses are better sources of
aggregation than census tract. Large amounts of source data require a
careful choice of aggregating zones to display the spatial variation of
the data in a comprehensible manner. It is this variation in acceptable
area solution that generates the term 'modifiable'(Openshaw, 1984 p.3).
The MAUP consists of both a scale and
an aggregation problem, and the
concept of the ecological fallacy should also be considered (Bailey and
Gatrell, 1995). The scale problem is relatively well known. It is the
variation which can occur when data from one scale of area units is
aggregated into more or less area units. For example, much of the
variation in CT changes or is lost when the same data is aggregated to
the EA or DA level. The aggregation problem is less well known and
becomes apparent when faced with the variety of different possible area
units for aggregation (www.jratcliffe.net). Although geographical
studies tend towards aggregating units which have a geographical
boundary, it is possible to aggregate spatial units which are spatially
distinct. Aggregating neighbours improves the problem to a small degree
but does not get round the quantity of variation in possibilities which
remains.
The ecological fallacy is a situation
that can occur when a researcher
or analyst makes an inference about an individual based on aggregate
data for a group. It can have many variations. The main problem however
is when researchers make assumptions about an individual who lives in
an area based on aggregate data about the region. For instance, a
researcher might examine the aggregate data on family low income for a
census tract of a city, and discoverer that the average family low
income for the residents of that area is $40, 000. To state that the
average income for residents of that area is $40,000 is true and
accurate. The ecological fallacy occurs when the researcher states,
based on this data, that people living in that census tract earn about
$40,000. This may not be true at all, and may be an ecological fallacy.
Assumptions made about individuals based on aggregate data are
vulnerable to the ecological fallacy. This does not mean that
identifying associations between aggregate figures is necessarily
flawed, and it does not necessarily mean that any inferences drawn
about associations between the characteristics of an aggregate
population and the characteristics of sub-units within the population
are absolutely wrong either. What it does say is that the process of
aggregating or disaggregating data may conceal the variations that are
not visible at the larger aggregate level (www.jratcliffe.net).