<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1476-072X-9-39</ui><ji>1476-072X</ji><fm>
<dochead>Methodology</dochead>
<bibl>
<title>
<p>Density estimation and adaptive bandwidths: A primer for public health practitioners</p>
</title>
<aug>
<au id="A1"><snm>Carlos</snm><mi>A</mi><fnm>Heather</fnm><insr iid="I1"/><email>heather.a.carlos@dartmouth.edu</email></au>
<au id="A2"><snm>Shi</snm><fnm>Xun</fnm><insr iid="I2"/><email>Xun.Shi@dartmouth.edu</email></au>
<au id="A3"><snm>Sargent</snm><fnm>James</fnm><insr iid="I1"/><email>James.Sargent@dartmouth.edu</email></au>
<au id="A4"><snm>Tanski</snm><fnm>Susanne</fnm><insr iid="I1"/><email>Susanne.Tanski@dartmouth.edu</email></au>
<au ca="yes" id="A5"><snm>Berke</snm><mi>M</mi><fnm>Ethan</fnm><insr iid="I1"/><insr iid="I2"/><insr iid="I3"/><insr iid="I4"/><email>Ethan.Berke@TDI.dartmouth.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA</p></ins>
<ins id="I2"><p>Department of Geography, Dartmouth College, Hanover NH, USA</p></ins>
<ins id="I3"><p>Department of Community and Family Medicine, Dartmouth Medical School, Hanover NH, USA</p></ins>
<ins id="I4"><p>Prevention Research Center at Dartmouth, The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon NH, USA</p></ins>
</insg>
<source>International Journal of Health Geographics</source>
<issn>1476-072X</issn>
<pubdate>2010</pubdate>
<volume>9</volume>
<issue>1</issue>
<fpage>39</fpage>
<url>http://www.ij-healthgeographics.com/content/9/1/39</url>
<xrefbib><pubidlist><pubid idtype="pmpid">20653969</pubid><pubid idtype="doi">10.1186/1476-072X-9-39</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>20</day><month>4</month><year>2010</year></date></rec><acc><date><day>23</day><month>7</month><year>2010</year></date></acc><pub><date><day>23</day><month>7</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Carlos et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Geographic information systems have advanced the ability to both visualize and analyze point data. While point-based maps can be aggregated to differing areal units and examined at varying resolutions, two problems arise 1) the modifiable areal unit problem and 2) any corresponding data must be available both at the scale of analysis and in the same geographic units. Kernel density estimation (KDE) produces a smooth, continuous surface where each location in the study area is assigned a density value irrespective of arbitrary administrative boundaries. We review KDE, and introduce the technique of utilizing an adaptive bandwidth to address the underlying heterogeneous population distributions common in public health research.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>The density of occurrences should not be interpreted without knowledge of the underlying population distribution. When the effect of the background population is successfully accounted for, differences in point patterns in similar population areas are more discernible; it is generally these variations that are of most interest. A static bandwidth KDE does not distinguish the spatial extents of interesting areas, nor does it expose patterns above and beyond those due to geographic variations in the density of the underlying population. An adaptive bandwidth method uses background population data to calculate a kernel of varying size for each individual case. This limits the influence of a single case to a small spatial extent where the population density is high as the bandwidth is small. If the primary concern is distance, a static bandwidth is preferable because it may be better to define the "neighborhood" or exposure risk based on distance. If the primary concern is differences in exposure across the population, a bandwidth adapting to the population is preferred.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Kernel density estimation is a useful way to consider exposure at any point within a spatial frame, irrespective of administrative boundaries. Utilization of an adaptive bandwidth may be particularly useful in comparing two similarly populated areas when studying health disparities or other issues comparing populations in public health.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Introduction</p>
</st>
<p>From John Snow's Victorian era map of cholera deaths <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> to interactive maps tracking the spread of H1N1 Influenza <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>, spatial point patterns have a long and rich history in the public health arena. Disease registries now include geolocation data, which allow detection of clustering (a global tendency) and clusters (a local phenomenon). Public health practitioners focusing on disease prevention use spatial point pattern analysis to quantify social determinants of health (for example, distance to sites of physical activity <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp> or to retail outlets <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>, discrepancies in access to services by race or ethnicity, or variation in educational attainment).</p>
<p>Geographic information systems (GIS) have advanced the ability to both visualize and analyze these point data. Using GIS, point based maps can easily be aggregated to differing areal units and examined at varying resolutions. This however, creates problems in spatial analysis. In addition to introducing the modifiable areal unit problem (MAUP) <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>, where altering the area or shape of an aggregate unit may alter the value within the polygon, any corresponding demographic data must also be available both at the scale of analysis and in the same geographic units. One way to address these issues is to employ kernel density estimation (KDE) techniques rather than geographic aggregation <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp>. KDE is a non-parametric method of extrapolating point data over an area of interest without invoking MAUP or relying on fixed boundaries for aggregation. The density of points is calculated using a specified bandwidth (a circle of a given radius centered at the focal location). This produces a smooth, continuous surface where each location in the study area is assigned a density value, which can then be used as the independent or dependent variable in statistical models. KDE's strength is its ability to provide an estimate of density at any location in the spatial frame (e.g. a geocoded subject or another point of interest), irrespective of arbitrary administrative boundaries.</p>
<p>While various methods exist for calculating KDE surfaces, including some embedded in common GIS software, many public health practitioners and researchers use a static distance for bandwidth patterned after the case-control method <abbrgrp>
<abbr bid="B7">7</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp>. A more in-depth discussion on the use of KDE in public health can be found in our prior work <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp> and that of others <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>. When geographic distance (and count of cases) is the primary interest, a static bandwidth KDE appropriately represents the density of a particular attribute, for example understanding how relapse of alcoholism may be predicted in part by proximity to bars or pubs. However, for some health outcomes, a fixed geographic distance is not the appropriate bandwidth. Consider the hypothesis that alcohol outlets (retail alcohol sale locations) are more concentrated in low-income neighborhoods within a metro area. The problem with using a static bandwidth for each outlet is that we expect the greater density of alcohol outlets in urban areas where the population density is higher than in the suburbs. To the extent that outlet density in poor urban neighborhoods is just a reflection of a higher concentration of people living there, the correlation does not necessarily point to a health disparity.</p>
<p>There are a wide range of analytical methods available to examine spatial point patterns <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp> and other researchers have considered the effect of inhomogeneous background populations. Here, inhomogeneous background populations refers not to the population's demographics, rather to the distribution of the source of the event points. For our alcohol outlet example, the background population is the population of the study area, whereas in an analysis of disease cases, only the population at risk is used. Notably, the spatial filtering technique <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp> uses both fixed and adaptive filters/bandwidths to test or map the relationship between a count of cases and a background population while the cluster evaluation permutation procedure <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp> uses an adaptive bandwidth but focuses only on the case count. Alternatively, adding population density as a covariate in a statistical model could address this issue, but a more elegant solution incorporates population density into the outcome variable using a bandwidth that represents the underlying population, rather than a fixed geographical area. An adaptive bandwidth, discussed below, may be preferable when studying issues of population and variations in exposure.</p>
<p>In this paper, we present a number of approaches to density estimation and propose using a KDE method to address uneven population distribution by using an adaptive bandwidth specified by the underlying population. This technique is useful when it is it is important to understand if a density value is just a reflection of the local population or if it may stem from other causes. This methodology was motivated by an analysis of the distribution of alcohol outlets. An illustrative application in that arena is used to compare density methods, but it is also applicable to the analysis of the density of disease, crime, healthcare clinics and other fields where the background population is heterogeneous.</p>
</sec>
<sec>
<st>
<p>Background</p>
</st>
<p>We focus on a suite of density estimation tools: point density, static bandwidth kernel density estimation and adaptive bandwidth kernel density estimation. Density calculations operate on either <it>cases </it>or <it>sites</it>. Cases are event points (e.g. addresses of alcohol outlets or disease cases) whereas sites represent all locations (pixels or each point on a grid) in a study area. Density calculations performed on sites (site-side method) evaluate the density for every location in the study area, whereas the case-side method only looks at case locations and their defined surrounding locations. In order to highlight the differences in the density estimation tools presented here, we limit our discussion to case-side methods of density estimation. Case-side methods are more computationally efficient and in many situations better represent the nature of the application problem. More information about the differences between case-side and site-side methods can be found in Shi <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>.</p>
<sec>
<st>
<p>Point Density</p>
</st>
<p>Most generically, a point density function (also called intensity function) defines the number of cases (alcohol outlets, disease cases) per unit area at each location throughout an area of interest. To calculate this density surface, for each case, a "neighborhood" is delineated, usually by defining a search radius (or bandwidth); the number of cases that fall within the neighborhood are divided by the area of the neighborhood; this value is assigned to the neighborhood (Figure <figr fid="F1">1</figr>). The intensity function is expressed as <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>
</p>
<p>
<display-formula id="M1">
<graphic file="1476-072X-9-39-i1.gif"/>
</display-formula>
</p>
<p>where &#955;(x,y) is the intensity (or point density) at location(x,y), n is the number of events and |A| is the area of the neighborhood. When neighborhoods overlap, the results are summed to indicate a higher density of cases. The units of &#955;(x,y) are cases per unit area.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Point density</p></caption><text>
   <p><b>Point density</b>. Equal values at all locations within the neighborhood (the circle) around the case (star in center).</p>
</text><graphic file="1476-072X-9-39-1" hint_layout="single"/></fig>
<p>When points are evenly distributed in space, increasing the bandwidth does not have a large impact on the intensity since, as larger neighborhoods are defined, n will likely increase, but so will |A|. However, increasing the bandwidth does provide a greater smoothing effect (or a more generalized surface), which risks removing meaningful spikes (peaks or valleys) or edges (extent of the influence of a case) from the original data distribution.</p>
<p>Although the point density function is relatively simple and straightforward, it does not convey any information about the spatial configuration of features of interest within the bandwidth. Consider two locations (sites) and one case. Computationally, a site coincident with a case returns the same &#955; as a site one bandwidth away from the case. While this approach is appropriate for studies which are interested in the number of events per unit area at a specified location (e.g. crime events, residential or population density), in other disciplines there is an expected attenuation with distance (e.g. environmental pollutants which dissipate as they travel from the source). In order to compensate for distance, a density function can incorporate a decay function to assign smaller values to locations which are still in the neighborhood, but more distant from a case. This is the approach employed by kernel density estimation.</p>
</sec>
<sec>
<st>
<p>Static Bandwidth Kernel Density Estimation</p>
</st>
<p>Kernel density estimation fits a curved surface over each case such that the surface is highest above the case and zero at a specified distance (the bandwidth) from the case (Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr>). In mathematical terms, it is expressed as <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>
</p>
<p>
<display-formula id="M2">
<graphic file="1476-072X-9-39-i2.gif"/>
</display-formula>
</p>
<p>where f(x,y) is the density value at location (x,y), n is the number of cases, h is the bandwidth, d<sub>i </sub>is the geographical distance between case i and location (x, y) and K is a density function (generally a radially symmetric unimodal probability density function) which integrates to one. The units of f(x,y) are cases per unit area.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Kernel Density Estimation</p></caption><text>
   <p><b>Kernel Density Estimation</b>. The decay function is illustrated with the highest values located under the case giving way to lower values.</p>
</text><graphic file="1476-072X-9-39-2" hint_layout="single"/></fig>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>3 D rendering of KDE</p></caption><text>
   <p><b>3 D rendering of KDE</b>.</p>
</text><graphic file="1476-072X-9-39-3" hint_layout="single"/></fig>
<p>Static bandwidth kernel density estimation is a technique that is appropriate when geographic distance (and case count) is the primary concern. Since it applies the same geographic extent to each case, static bandwidth KDE does not distinguish the spatial extents of interesting areas <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>, nor does it expose patterns above and beyond those due to geographic variations in the density of the underlying population <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Problems caused by heterogeneous backgrounds</p>
</st>
<p>Because health outcomes involve people, their spatial distribution will often reflect the spatial distribution of the underlying human population. Counts of disease are almost always higher in urban areas than rural areas simply due to the size of the potential exposed population. Likewise, counts of things people use are greater in higher population areas: there are more parks, physical activity sites, and retail outlets in places where more people live. As a result, the density of occurrences should not be interpreted without knowledge of the underlying population distribution <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. When the effect of the background population is successfully accounted for, differences in point patterns in similar population areas are more discernible; it is generally these variations that are of most interest <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>.</p>
<p>A Texas case study illustrates the problem posed by heterogeneous population backgrounds. Figure <figr fid="F4">4</figr> displays the point data for alcohol outlets while Figure <figr fid="F5">5</figr> shows the static bandwidth KDE surface for these data. As we would expect, both maps are similar to an image of Texas at night (Figure <figr fid="F6">6</figr>) since they replicate the population distribution. In contrast, Figure <figr fid="F7">7</figr> shows a density map with the underlying population addressed. An adaptive bandwidth KDE method was used to create this map and it is described below.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Map of alcohol outlets</p></caption><text>
   <p><b>Map of alcohol outlets</b>.</p>
</text><graphic file="1476-072X-9-39-4" hint_layout="single"/></fig>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Kernel Density Estimation of alcohol outlets</p></caption><text>
   <p><b>Kernel Density Estimation of alcohol outlets</b>.</p>
</text><graphic file="1476-072X-9-39-5" hint_layout="single"/></fig>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Texas at night, as seen from space <abbrgrp><abbr bid="B23">23</abbr></abbrgrp></p></caption><text>
   <p><b>Texas at night, as seen from space</b><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
</text><graphic file="1476-072X-9-39-6" hint_layout="single"/></fig>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>Adaptive Bandwidth Kernel Density Estimation of alcohol outlets</p></caption><text>
   <p><b>Adaptive Bandwidth Kernel Density Estimation of alcohol outlets</b>.</p>
</text><graphic file="1476-072X-9-39-7" hint_layout="single"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Methods - Adaptive Bandwidth</p>
</st>
<sec>
<st>
<p>Data Sources</p>
</st>
<p>Before delving into the adaptive bandwidth methodology, we discuss a data source for the background population. Most GIS based population data are in a polygon format with a population count (or estimate) assigned to each polygon. Depending on the study area and data source, each polygon may be as large as a country, or as small as a city block. These polygons are often irregular shapes and sizes and lack data about how people are geographically dispersed within the polygon. In addition, administrative boundaries may not be consistent with the travel patterns or service utilization of those that live in them. An alternative to polygon based population data is the LandScan&#8482; Global Population Database, which was developed using multiple techniques to disaggregate census counts within an administrative boundary. This worldwide population data product is available on a 30" &#215; 30" latitude/longitude grid (a pixel located in the central United States is approximately 0.65 km<sup>2</sup>). The advantage of the grid format is that it regularizes the areal unit for population values, unlike administrative boundaries, which vary in size. This makes counts at different locations more spatially comparable and facilitates spatial analysis operations. However, it disconnects the population counts from the related demographic data, which is included in many censuses. Additionally, in urban areas, census blocks may be smaller than the LandScan&#8482; grid and as a result, the larger grid units aggregate the original population counts. A description of the LandScan&#8482; data and the methodology used to create it are described in detail elsewhere <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>.</p>
<p>Geocoded data for the alcohol outlets were obtained from the NAICS (North American Industry Classification System) Association <url>http://www.naics.com</url>. Details on this data and related processing can be found elsewhere <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Adaptive Bandwidth Kernel Density Estimation</p>
</st>
<p>Whereas the static bandwidth kernel density estimation model employs a bandwidth based on a geographic distance, the adaptive bandwidth method uses background population drawn from LandScan&#8482; data to calculate a kernel of varying size for each individual case (which, using the examples above could be an alcohol outlet). This limits the influence of a single case to a small spatial extent where the population density is high as the bandwidth is small <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. Conversely, in rural areas where population is lower, the kernel is geographically larger and the influence of a single case is greater (Figure <figr fid="F8">8</figr>).</p>
<fig id="F8"><title><p>Figure 8</p></title><caption><p>Two overlapping cases and the related density surface</p></caption><text>
   <p><b>Two overlapping cases and the related density surface</b>. In A, the case in the upper right has a smaller spatial impact, as the expected population was reached earlier than the case in the lower left. B is a rotated 3 dimensional image of the surface shown in A.</p>
</text><graphic file="1476-072X-9-39-8" hint_layout="single"/></fig>
<p>The adaptive method is calculated as follows <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>:</p>
<p>
<display-formula id="M3">
<graphic file="1476-072X-9-39-i3.gif"/>
</display-formula>
</p>
<p>Note that there are several differences between equations 2 and 3. Most notably, in the adaptive method, the bandwidth is represented by P(u,v) which is a function centered on the case located at (u,v) and based on the local population. Additionally, the denominator nh<sup>2 </sup>is dropped since the output value is not normalized by the geographic area of the kernel (h<sup>2</sup>). The adaptive bandwidth method bases the influence of a case on the underlying population support, not the area support. There are various choices for the function K; most will not significantly affect the outcome. This study uses a simple form:</p>
<p>
<display-formula id="M4">
<graphic file="1476-072X-9-39-i4.gif"/>
</display-formula>
</p>
<p>Often the influence of one case will overlap that of another. In this situation, the bandwidth and kernel density estimation calculations are performed separately for each case and then the results are summed (Figure <figr fid="F8">8</figr>, Equation 3).</p>
</sec>
<sec>
<st>
<p>Constraints on the Adaptive Bandwidth</p>
</st>
<p>The expected population parameter determines the extent of the adaptive bandwidth. The expected population normalizes the influence of each case to a certain number of people and thus the bandwidth stops expanding when the expected population is reached. In less populous areas however, the bandwidth could expand beyond the distance where a case may influence health; one can therefore set a limit to the maximum distance of the bandwidth. The maximum distance parameter restricts the bandwidth from expanding further, even if the expected population has not been reached. This may be critical when considering health behaviors influenced by exposures that are beyond a reasonable distances from an individual.</p>
<p>Bandwidth is determined for each individual case by summing the underlying population, starting with the pixel directly under the case and then expanding outward until the expected population is reached. Given reasonable values for population and maximum distance, expected population exerts the most control. In urban areas with high population densities, the expected population limit will often be reached before the maximum distance, thus, by adjusting the expected population, the radius of the influence of the case will diminish. The same is true in rural areas, except that in areas of very low population density the maximum distance may be called into play to limit bandwidth.</p>
</sec>
</sec>
<sec>
<st>
<p>An Application of Adaptive Bandwidth</p>
</st>
<sec>
<st>
<p>Static and Adaptive Bandwidth KDE for Alcohol Outlets</p>
</st>
<p>The difference between static and adaptive bandwidth KDE methods is best illustrated through visualization. Figure <figr fid="F9">9</figr> portrays the results from each method applied to alcohol outlets (Figure <figr fid="F9">9D</figr>) in the area surrounding San Antonio, Texas. As mentioned above, the static bandwidth KDE surfaces (Figures <figr fid="F9">9A</figr> and <figr fid="F9">9E</figr>) excel at identifying areas where there are many point sources, but they do not provide a basis for discerning where the point sources are higher or lower than would be expected given the underlying population (Figure <figr fid="F9">9C</figr>). The adaptive bandwidth KDE (Figure <figr fid="F9">9B</figr> and <figr fid="F9">9F</figr>) addresses these issues through utilization of a population-based bandwidth, allowing for improved detection of neighborhood-level differences in exposure, even in areas that have similarly high raw counts of alcohol outlets. This is illustrated in the close-up of San Antonio (Figures <figr fid="F9">9E</figr> and <figr fid="F9">9F</figr>) where the adaptive bandwidth KDE (Figure <figr fid="F9">9F</figr>) shows fine-grained variability in the urban center whereas the static bandwidth KDE (Figure <figr fid="F9">9E</figr>) shows little differentiation in alcohol outlet density. This level of analysis is important in associating density of exposure with markers of health disparities, such as poverty, as seen in Figure <figr fid="F9">9G</figr>.</p>
<fig id="F9"><title><p>Figure 9</p></title><caption><p>Cartographic comparison of density estimation of alcohol outlets near San Antonio, Texas</p></caption><text>
   <p><b>Cartographic comparison of density estimation of alcohol outlets near San Antonio, Texas</b>. A-D show San Antonio in the center and Austin in the upper right with more rural areas to the south and west, E-G are zoomed in to show just San Antonio. A and E illustrate KDE using a static bandwidth of ~10 km. B and F illustrate KDE using an adaptive bandwidth with an expected population of 1,000 people and a maximum distance of ~25 km. C shows a LandScan&#8482; dataset where each pixel represents a population count. D shows point data representing alcohol outlets. G is a map of census tracts showing the percentage of families below the poverty level.</p>
</text><graphic file="1476-072X-9-39-9" hint_layout="double"/></fig>
<p>In deciding which method to choose, one needs to consider the research hypothesis. Using alcohol outlets as an example, if the primary concern is distance, a static bandwidth is preferable because it may be better to define the "neighborhood" or exposure risk of each store based on distance. If the primary concern is the "share" of the "service" per person, or differences in exposure across the population, a bandwidth adapting to the population is preferred.</p>
<p>More theoretically, the static bandwidth has a fixed spatial certainty, but varying statistical stability across the area, and thus is more suitable for an application primarily concerning distance. The adaptive bandwidth has a fixed statistical stability but varying spatial certainty. As a result, in a high population density area, it has both high statistical stability (specified by the user) and high spatial certainty (e.g., can better reveal the size of a hot spot), but in a low population density area, its high statistical stability comes with a cost of low spatial certainty.</p>
</sec>
<sec>
<st>
<p>Limitations</p>
</st>
<p>There are limitations to all methods of spatial analysis, including density estimation, which induces interpolation autocorrelation which may result in over smoothing <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> (this can be controlled by using a global Monte Carlo simulation). Perhaps the greatest limitation is the relatively arbitrary selection of bandwidth limits with both static and adaptive methods. Too large or small a bandwidth poses the risk of over or undersmoothing the original data, respectively. As subsequent analyses are based on this estimated density information, as opposed to original points, a change in bandwidth may have a significant impact on statistical relationships between dependent and independent variables. Methods to estimate appropriate bandwidths are described in more detail elsewhere <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B22">22</abbr>
</abbrgrp>. We recommend, even when applying mathematical models to estimate bandwidth, that one test multiple parameters for bandwidth in a sensitivity analysis. When employing a population-based adaptive bandwidth, applying a distance limit similar to the example above may be useful when considering the influence of exposure on behavior. In the example of alcohol outlets used above, we placed a 25 km limit on the density calculation if the expected population threshold of 1000 people was not reached. We determined this maximum distance by testing a number of distances as well as choosing a limit based on behavior theory regarding alcohol acquisition and consumption. For a process other than alcohol exposure, a different distance limit may make more sense.</p>
<p>Even though techniques such as static or adaptive bandwidth KDE do not rely on aggregated data or administrative boundaries, issues of data visualization remain. One should use caution when viewing density data at a small scale. This is particularly true when viewing U.S. national maps where variations in density in high population but geographically smaller, northeastern areas are difficult to visualize, leading to visual bias towards the lower population areas of the west. Indeed, display of a density map may not be appropriate at all unless the proper scale is chosen. Quantitative data from the KDE may be better reported in a tabular presentation. Finally, the analyses in this study were conducted using a program created by one of the authors, as opposed to commercially available software. With recognition of multiple methods of KDE and its use on the rise, we expect that these approaches will appear in common GIS and spatial statistics software in the near future.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>Researchers in the health sciences should be aware that multiple approaches to density estimation exist. Kernel density estimation is a useful way to consider exposure at any location within a spatial frame, irrespective of administrative boundaries. The ability for the researcher to analyze data easily at multiple scales reduces the risk of misinterpretation of results due to the MAUP. Utilization of an adaptive bandwidth may be particularly useful in comparing two similarly populated areas when studying health disparities.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>HC conceived of the study, performed the analysis, and drafted the manuscript. XS designed and implemented the population based adaptive bandwidth method and assisted with manuscript preparation. JS assisted with data interpretation and manuscript preparation, ST assisted with data interpretation and manuscript preparation. EB conceived of the study, supervised analyses, and drafted the manuscript. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was supported by the National Institutes of Health (AA015591 and CA 077026). Ethan Berke is supported by 1K23AG036934.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><aug><au><snm>Snow</snm><fnm>J</fnm></au></aug><source>On the mode of communication of cholera</source><publisher>London,: J. Churchill</publisher><edition>2</edition><pubdate>1855</pubdate></bibl><bibl id="B2"><title><p>Tracking Swine Flu Cases Worldwide</p></title><source>The New York Times</source><pubdate>2009</pubdate><url>http://www.nytimes.com/interactive/2009/04/27/us/20090427-flu-update-graphic.html</url></bibl><bibl id="B3"><title><p>The complexities of measuring access to parks and physical activity sites in New York City: a quantitative and qualitative approach</p></title><aug><au><snm>Maroko</snm><fnm>AR</fnm></au><au><snm>Maantay</snm><fnm>JA</fnm></au><au><snm>Sohler</snm><fnm>NL</fnm></au><au><snm>Grady</snm><fnm>KL</fnm></au><au><snm>Arno</snm><fnm>PS</fnm></au></aug><source>Int J Health Geogr</source><pubdate>2009</pubdate><volume>8</volume><fpage>34</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1476-072X-8-34</pubid><pubid idtype="pmcid">2708147</pubid><pubid idtype="pmpid">19545430</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Retail Alcohol Density and Poverty in Urban U.S. Census Tracts: A Geographic Analysis</p></title><aug><au><snm>Berke</snm><fnm>EM</fnm></au><au><snm>Tanski</snm><fnm>S</fnm></au><au><snm>Alford-Teaster</snm><fnm>J</fnm></au><au><snm>Shi</snm><fnm>X</fnm></au><au><snm>Sargent</snm><fnm>J</fnm></au></aug><source>American Journal of Public Health</source><pubdate>2010</pubdate><inpress/><xrefbib><pubid idtype="pmpid" link="fulltext">20724696</pubid></xrefbib></bibl><bibl id="B5"><title><p>Ecological fallacies and the analysis of areal census data</p></title><aug><au><snm>Openshaw</snm><fnm>S</fnm></au></aug><source>Environ Plan A</source><pubdate>1984</pubdate><volume>16</volume><fpage>17</fpage><lpage>31</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1068/a160017</pubid><pubid idtype="pmpid">12265900</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><aug><au><snm>Silverman</snm><fnm>BW</fnm></au></aug><source>Density estimation for statistics and data analysis</source><publisher>London; New York: Chapman and Hall</publisher><pubdate>1986</pubdate></bibl><bibl id="B7"><title><p>An application of density estimation to geographical epidemiology</p></title><aug><au><snm>Bithell</snm><fnm>JF</fnm></au></aug><source>Statistics in Medicine</source><pubdate>1990</pubdate><volume>9</volume><fpage>691</fpage><lpage>701</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/sim.4780090616</pubid><pubid idtype="pmpid">2218172</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Spatial accessibility of primary care: concepts, methods and challenges</p></title><aug><au><snm>Guagliardo</snm><fnm>MF</fnm></au></aug><source>Int J Health Geogr</source><pubdate>2004</pubdate><volume>3</volume><fpage>3</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1476-072X-3-3</pubid><pubid idtype="pmcid">394340</pubid><pubid idtype="pmpid">14987337</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>A classification of disease mapping methods</p></title><aug><au><snm>Bithell</snm><fnm>JF</fnm></au></aug><source>Statistics in Medicine</source><pubdate>2000</pubdate><volume>19</volume><fpage>2203</fpage><lpage>2215</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/1097-0258(20000915/30)19:17/18&lt;2203::AID-SIM564&gt;3.0.CO;2-U</pubid><pubid idtype="pmpid" link="fulltext">10960848</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>A Geocomputational Process for Characterizing the Spatial Pattern of Lung Cancer Incidence in New Hampshire</p></title><aug><au><snm>Shi</snm><fnm>X</fnm></au></aug><source>Annals of the Association of American Geographers</source><pubdate>2009</pubdate><volume>99</volume><fpage>521</fpage><lpage>533</lpage><xrefbib><pubid idtype="doi">10.1080/00045600902931801</pubid></xrefbib></bibl><bibl id="B11"><title><p>Selection of bandwidth type and adjustment side in kernel density estimation over inhomogeneous backgrounds</p></title><aug><au><snm>Shi</snm><fnm>X</fnm></au></aug><source>International Journal of Geographical Information Science</source><pubdate>2010</pubdate><volume>24</volume><fpage>643</fpage><lpage>660</lpage><xrefbib><pubid idtype="doi">10.1080/13658810902950625</pubid></xrefbib></bibl><bibl id="B12"><title><p>Spatial Point Pattern Analysis and Its Application in Geographical Epidemiology</p></title><aug><au><snm>Gatrell</snm><fnm>AC</fnm></au><au><snm>Bailey</snm><fnm>TC</fnm></au><au><snm>Diggle</snm><fnm>PJ</fnm></au><au><snm>Rowlingson</snm><fnm>BS</fnm></au></aug><source>Transactions of the Institute of British Geographers</source><pubdate>1996</pubdate><volume>21</volume><fpage>256</fpage><lpage>274</lpage><xrefbib><pubid idtype="doi">10.2307/622936</pubid></xrefbib></bibl><bibl id="B13"><title><p>Exploratory spatial analysis of birth defect rates in an urban population</p></title><aug><au><snm>Rushton</snm><fnm>G</fnm></au><au><snm>Lolonis</snm><fnm>P</fnm></au></aug><source>Stat Med</source><pubdate>1996</pubdate><volume>15</volume><fpage>717</fpage><lpage>726</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/(SICI)1097-0258(19960415)15:7/9&lt;717::AID-SIM243&gt;3.0.CO;2-0</pubid><pubid idtype="pmpid">9132899</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Using Spatially Adaptive Filters to map Late Stage Colorectal cancer Incidence in Iowa</p></title><aug><au><snm>Tiwari</snm><fnm>C</fnm></au></aug><source>Developments in spatial data handling: 11th International Symposium on Spatial Data Handling</source><publisher>Berlin; New York: Springer</publisher><editor>Fisher PF</editor><pubdate>2005</pubdate><fpage>676</fpage><note>xix</note></bibl><bibl id="B15"><title><p>Evaluation of spatial filters to create smoothed maps of health data</p></title><aug><au><snm>Talbot</snm><fnm>TO</fnm></au><au><snm>Kulldorff</snm><fnm>M</fnm></au><au><snm>Forand</snm><fnm>SP</fnm></au><au><snm>Haley</snm><fnm>VB</fnm></au></aug><source>Stat Med</source><pubdate>2000</pubdate><volume>19</volume><fpage>2399</fpage><lpage>2408</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/1097-0258(20000915/30)19:17/18&lt;2399::AID-SIM577&gt;3.0.CO;2-R</pubid><pubid idtype="pmpid" link="fulltext">10960861</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Monitoring for clusters of disease: application to leukemia incidence in upstate New York</p></title><aug><au><snm>Turnbull</snm><fnm>BW</fnm></au><au><snm>Iwano</snm><fnm>EJ</fnm></au><au><snm>Burnett</snm><fnm>WS</fnm></au><au><snm>Howe</snm><fnm>HL</fnm></au><au><snm>Clark</snm><fnm>LC</fnm></au></aug><source>Am J Epidemiol</source><pubdate>1990</pubdate><volume>132</volume><fpage>S136</fpage><lpage>143</lpage><xrefbib><pubid idtype="pmpid">2356825</pubid></xrefbib></bibl><bibl id="B17"><aug><au><snm>Waller</snm><fnm>LA</fnm></au><au><snm>Gotway</snm><fnm>CA</fnm></au></aug><source>Applied spatial statistics for public health data</source><publisher>Hoboken, N.J.: John Wiley &amp; Sons</publisher><pubdate>2004</pubdate><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B18"><title><p>Estimating probability surfaces for geographical point data: An adaptive kernel algorithm</p></title><aug><au><snm>Brunsdon</snm><fnm>C</fnm></au></aug><source>Computers &amp; Geosciences</source><pubdate>1995</pubdate><volume>21</volume><fpage>877</fpage><lpage>894</lpage></bibl><bibl id="B19"><title><p>LandScan: A global population database for estimating populations at risk</p></title><aug><au><snm>Dobson</snm><fnm>JE</fnm></au><au><snm>Bright</snm><fnm>EA</fnm></au><au><snm>Coleman</snm><fnm>PR</fnm></au><au><snm>Durfee</snm><fnm>RC</fnm></au><au><snm>Worley</snm><fnm>BA</fnm></au></aug><source>Photogrammetric Engineering and Remote Sensing</source><pubdate>2000</pubdate><volume>66</volume><fpage>849</fpage><lpage>857</lpage></bibl><bibl id="B20"><title><p>LandScan&#8482;Global Population Database</p></title><publisher>Oak Ridge, TN; Oak Ridge National Laboratory</publisher><url>http://www.ornl.gov/landscan/</url></bibl><bibl id="B21"><title><p>Evaluating the uncertainty caused by Post Office Box addresses in environmental health studies: A restricted Monte Carlo approach</p></title><aug><au><snm>Shi</snm><fnm>X</fnm></au></aug><source>International Journal of Geographical Information Science</source><pubdate>2007</pubdate><volume>21</volume><fpage>325</fpage><lpage>340</lpage><xrefbib><pubid idtype="doi">10.1080/13658810600924211</pubid></xrefbib></bibl><bibl id="B22"><aug><au><snm>Scott</snm><fnm>DW</fnm></au></aug><source>Multivariate density estimation: theory, practice, and visualization</source><publisher>New York: Wiley</publisher><pubdate>1992</pubdate></bibl><bibl id="B23"><title><p>Data Processing by Noaa's National Geophusical Data Center, Dmsp Data Collected by the US air Force Weather Agency</p></title><aug><au><cnm>World Stable Lites</cnm></au></aug><url>http://www.ngdc.noaa.gov/dmsp/download_Night_time_lights_94-95.html</url></bibl></refgrp>
</bm></art>