Email updates

Keep up to date with the latest news and content from IJHG and BioMed Central.

Open Access Methodology

A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes

Ho-Won Jung1* and Khaled El Emam23

Author Affiliations

1 Korea University Business School, 145, Anam-ro, Seongbuk-gu, Seoul 136-701, Korea

2 Children’s Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario K1J 8L1, Canada

3 Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada

For all author emails, please log on.

International Journal of Health Geographics 2014, 13:16  doi:10.1186/1476-072X-13-16

Published: 29 May 2014

Abstract

Background

A linear programming (LP) model was proposed to create de-identified data sets that maximally include spatial detail (e.g., geocodes such as ZIP or postal codes, census blocks, and locations on maps) while complying with the HIPAA Privacy Rule’s Expert Determination method, i.e., ensuring that the risk of re-identification is very small. The LP model determines the transition probability from an original location of a patient to a new randomized location. However, it has a limitation for the cases of areas with a small population (e.g., median of 10 people in a ZIP code).

Methods

We extend the previous LP model to accommodate the cases of a smaller population in some locations, while creating de-identified patient spatial data sets which ensure the risk of re-identification is very small.

Results

Our LP model was applied to a data set of 11,740 postal codes in the City of Ottawa, Canada. On this data set we demonstrated the limitations of the previous LP model, in that it produces improbable results, and showed how our extensions to deal with small areas allows the de-identification of the whole data set.

Conclusions

The LP model described in this study can be used to de-identify geospatial information for areas with small populations with minimal distortion to postal codes. Our LP model can be extended to include other information, such as age and gender.

Keywords:
Health services research; Linear programming (LP); De-identified data sets; Geographical identifiers; HIPAA Privacy Rule