Mapping Disease: The Evolution of Spatial Epidemiology
Introduction

Thematic Maps

Mapping Disease (1 & 2)

Computer Mapping Spatial Epidemiology References

Spatial Epidemiology

The ability to incorporate spatial patterns into statistical analysis and accurately portray patterns on maps has significantly improved over the past 40 years. Although early disease maps had enormous investigative value, they could also be highly misleading. By representing cholera deaths, rather than rates or age-adjusted rates, for example, Dr. John Snow ignored differences in population density and the age structure of the population. Areas with greater numbers of people and/or those with high numbers of elderly or other susceptible populations, were not distinguished in early maps from sparsely populated regions.

Related graphical and statistical issues also plague modern mapmakers looking to portray spatial patterns in disease rates. Dr. Monmonier, for example, uses nationwide mortality rates to illustrate the pitfalls of ignoring the age structure of a population. A map of crude mortality rates exhibits visible clusters of high rates in the Midwest and Northeastern United States. When adjusted for age, however, these clusters disappear from the Midwest and reappear in the South, a possible indication of relatively better health care and higher socioeconomic status (Monmonier 1991). In other words, adjusting for known risk factors can completely alter conclusions.

Additional sources of potentially misleading information are large, sparsely populated areas with few cases which can lead to some eye catching deviations from mean rates. For example, modern atlases of disease often use either an adjusted rate – specifically the ratio of the number of observed deaths to the number of expected deaths, called the standardized mortality ratio (SMR) – or the level of significance of deviations from rates for the entire map. Discussing the problems with these methods Clayton and Kaldor (1987) explain that with standardized mortality ratios:

No account is taken of varying population size over the map so that imprecisely estimated SMRs, based on only a few cases, may be the extremes of the map, and hence dominate its pattern. On the other hand, mapping significance alone totally ignores the size of the corresponding effect, so that on the map, two areas with identical SMRs may be indicated quite differently if they are of unequal population size and the most extreme areas may simply be those with the largest populations.

These problems and related statistical issues have led to sophisticated innovations such as hierarchical and Bayesian methods in order to “smooth” rates and dampen variability. Empirical or fully Bayesian methods consider overall rate variability in addition to observed rates in a given area. With a limited number of events or small population, Bayesian methods permit “borrowing” information from neighboring or nearby locations (Mollié 1999).

Under the Bayesian approach spatial correlation or other “random effects” may be modeled using prior distributions. Disease rates may then be conditioned on the estimated parameters of the prior distributions. Lawson (2001) discusses the “unique features of spatial epidemiological problems where it is natural to model data via prior distributions” including, for example, the spatially correlated heterogeneity found in the random effects of population strata (e.g., age * sex) and region-specific random effects.

These methods often rely on area-based approaches to measuring and accounting for the spatial variability whereby a neighborhood structure is defined based on, for example, adjacent, first-order neighbors (or second-order, or third-order neighbors and so on) (Wakefield, Best, and Waller 2000). Nearest neighbor Markov random field models, for example, are commonly used to express prior information on the spatial structure of disease rates (Mollié 1999). Assigning the neighborhood structure, however, often relies on the investigators’ intuition (e.g., assuming disease rates in neighboring counties are similar, but are not similar in non-adjacent counties). A neighborhood approach becomes particularly intractable when the analysis involves areas with varying size.

An alternative approach is to identify and account for the underlying spatial process using geostatistics (Diggle, Tawn, and Moyeed 1998; Diggle et al. 2002). Unlike the more traditional nearest neighbor (lattice-based) approaches that rely on an arbitrarily defined neighborhood (most often the first-order adjacent neighbors), geostatistics can empirically model the relationship between responses (in this case rates) and a full range of distances.











 

 


© 2006 ZevRoss Spatial Analysis

All Rights Reserved