Open Access Research

The effect of spatial aggregation on performance when mapping a risk of disease

Caroline Jeffery1*, Al Ozonoff2 and Marcello Pagano3

Author Affiliations

1 Liverpool School of Tropical Medicine, Department of International Public Health, Monitoring and Evaluation Technical assistance and Research group, Liverpool L3 5QA, UK

2 Department of Pediatrics, Harvard Medical School, Center for Patient Safety and Quality Research, Boston Children’s Hospital, Boston, MA 02118, USA

3 Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA

For all author emails, please log on.

International Journal of Health Geographics 2014, 13:9  doi:10.1186/1476-072X-13-9

Published: 13 March 2014



Spatial data on cases are available either in point form (e.g. longitude/latitude), or aggregated by an administrative region (e.g. zip code or census tract). Statistical methods for spatial data may accommodate either form of data, however the spatial aggregation can affect their performance. Previous work has studied the effect of spatial aggregation on cluster detection methods. Here we consider geographic health data at different levels of spatial resolution, to study the effect of spatial aggregation on disease mapping performance in locating subregions of increased disease risk.


We implemented a non-parametric disease distance-based mapping (DBM) method to produce a smooth map from spatially aggregated childhood leukaemia data. We then simulated spatial data under controlled conditions to study the effect of spatial aggregation on its performance. We used an evaluation method based on ROC curves to compare performance of DBM across different geographic scales.


Application of DBM to the leukaemia data illustrates the method as a useful visualization tool. Spatial aggregation produced expected degradation of disease mapping performance. Characteristics of this degradation, however, varied depending on the interaction between the geographic extent of the higher risk area and the level of aggregation. For example, higher risk areas dispersed across several units did not suffer as greatly from aggregation. The choice of centroids also had an impact on the resulting mapping.


DBM can be implemented for continuous and discrete spatial data, but the resulting mapping can lose accuracy in the second setting. Investigation of the simulations suggests a complex relationship between performance loss, geographic extent of spatial disturbances and centroid locations. Aggregation of spatial data destroys information and thus impedes efforts to monitor these data for spatial disturbances. The effect of spatial aggregation on cluster detection, disease mapping, and other useful methods in spatial epidemiology is complex and deserves further study.

Disease risk mapping; Distance-based mapping; Spatial data; Aggregation effect; Scale effect; MAUP; Simulations; Spatial epidemiology