Log on/register
BioMed Central home | Journals A-Z | Feedback | Support | My details
 
Open AccessResearch

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

James D Hibbert1 email, Angela D Liese1 email, Andrew Lawson2 email, Dwayne E Porter3 email, Robin C Puett3,4,5 email, Debra Standiford6 email, Lenna Liu7 email and Dana Dabelea8 email

Department of Epidemiology and Biostatistics and Center for Research in Nutrition and Health Disparities, Arnold School of Public Health, University of South Carolina, 921 Assembly Street, Columbia, SC, USA

Medical University of South Carolina College of Medicine, 135 Cannon Street, Suite 303, Charleston, SC, USA

Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, 921 Assembly Street, Columbia, SC, USA

Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 800 Sumter Street, Columbia, SC, USA

South Carolina Cancer Prevention and Control Program, University of South Carolina, 915 Greene Street, Columbia, SC, USA

Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, USA

University of Washington Child Health Institute, Seattle, WA, USA

University of Colorado School of Public Health, 13001 East 17th Avenue, Denver, CO, USA

author email corresponding author email

International Journal of Health Geographics 2009, 8:54doi:10.1186/1476-072X-8-54

Published: 8 October 2009

Abstract

Background

There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution).

Methods

We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic.

Results

At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003).

Conclusion

Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.