Geographic boundaries in breast, lung and colorectal cancers in relation to exposure to air toxics in Long Island, New York

Jacquez, Geoffrey M; Greiling, Dunrie A

doi:10.1186/1476-072X-2-4

Research
Open access
Published: 17 February 2003

Geographic boundaries in breast, lung and colorectal cancers in relation to exposure to air toxics in Long Island, New York

Geoffrey M Jacquez^1,2 &
Dunrie A Greiling^1,2

International Journal of Health Geographics volume 2, Article number: 4 (2003) Cite this article

18k Accesses
36 Citations
Metrics details

Abstract

Background

This two-part study employs several statistical techniques to evaluate the geographic distribution of breast cancer in females and colorectal and lung cancers in males and females in Nassau, Queens, and Suffolk counties, New York, USA. In this second paper, we compare patterns in standardized morbidity ratios (SMR values), calculated from New York State Department of Health (NYSDOH) data, to geographic patterns in overall predicted risk (OPR) from air toxics using exposures estimated in the USEPA National Air Toxics Assessment database.

Results

We identified significant geographic boundaries in SMR and OPR. We found little or no association between the SMR of colorectal and breast cancers and the OPR for each cancer from exposure to the air toxics. We did find boundaries in male and female lung cancer SMR and boundaries in lung cancer OPR to be closer to one another than expected.

Conclusion

While consistent with a causal relationship between air toxics and lung cancer incidence, the boundary analysis does not demonstrate the existence of a causal relationship. However, now that the areas of overlap between boundaries in lung cancer incidence and potential airborne exposures have been identified, we can begin to evaluate local- as well as large-scale determinants of lung cancer.

Background

This study is second in a two-paper series on cancer patterns on Long Island. In the first paper [1], we evaluated the spatial pattern of incidence of diagnoses of colorectal, breast, and lung cancers, identifying spatial clusters of high and low standardized morbidity ratio (SMR). In this paper, we compare cancer patterns to patterns in airborne carcinogens modeled in the National Air Toxics Assessment (NATA) database. While we acknowledge that environmental pollutant databases are imperfect and incomplete estimates of individual exposure, air toxics are one possible source of environmental exposure to carcinogens. If patterns in airborne toxins are significantly associated with cancer patterns, additional effort is warranted to determine whether or not there is a causative relationship. A more detailed understanding of spatial associations between patterns in health and environmental variables ultimately may lead to improved air quality and public health.

Health-environment relationships

Knowledge about possible relationships between human health and the environment is garnered in several ways. Laboratory studies explore how and whether toxic compounds cause disease at the organismic level. Epidemiological studies seek to identify whether risk factors, such as diet, socio-economic status and occupation, are associated with specific outcomes, such as breast cancer, in human populations. With the advent of more detailed environmental data from remote sensing, toxic release inventories, and monitoring networks, the possibility of undertaking studies that relate geographic patterns in health outcomes to geographic patterns in environmental factors is now possible. Such studies seek to identify clusters of disease, and to relate the locations of those clusters to geographic patterns in environmental factors that might have caused the disease. Like other epidemiological studies, pattern analysis cannot establish causality. It can determine whether and where there is a statistical excess (or deficit) of disease, and whether locations of elevated disease are geographically associated with areas where plausible risk factors also are high. Susser and Susser [2] call for an integration of several levels of research, from the molecular, the individual, to the societal, because factors on each of these levels often interact to cause chronic disease.

Our purpose in undertaking this analysis is to illustrate how geographic pattern analysis can increase our understanding of breast, lung and colorectal cancers in Long Island, New York. Critical to this understanding is an appreciation of the assumptions, caveats, and limitations of the geographical approach and of this study in particular. While presented last, these considerations should be kept firmly in mind when making any kind of inference or decision from this study's results.

Geographic Pattern Analysis

Pattern recognition plays an important role in data summarization and description by identifying salient features and structure in the data. By asking a carefully crafted series of pattern analysis questions it is possible to evaluate specific hypotheses regarding the geographic patterns of disease in human populations. These hypotheses correspond to questions regarding value, change and association.

• Value questions have to do with the values of the variables surveyed, and how they are arranged in geographic space. Disease clustering is principally concerned with value questions such as "Is there an excess of disease?" and "Where are disease rates significantly high?"

• Change questions have to do with how values vary through geographic space and through time. Change questions include "Where do disease rates change rapidly?" and "Where do air toxics change rapidly in geographic space?"

• Association questions relate spatial pattern in one variable or set of variables to the pattern in another set of variables.

Example association questions include "Is spatial pattern in health outcomes associated with:

• The environment? (Environmental, occupational, and food-borne exposures),

• Population? (Demography, marriage, birth, ethnicity) and

• Individual? (Genetics, behavior, individual risk factors)?"

In this set of two studies we address three questions about breast, lung and colorectal cancers in Long Island.

1. Where are the statistically significant excesses and deficits of cancer? This value question is answered using disease clustering techniques in the first paper [1].

2. Where are the zones of rapid change (boundaries) in cancer incidence? This change question is answered using geographic boundary analysis.

3. Is geographic pattern in cancer incidence related to geographic patterns in carcinogen concentrations as modeled by the National Air Toxics Assessment program? This association question is evaluated using boundary overlap analysis.

Methods

Data

Cancer Incidence

The New York State Department of Health (NYSDOH) published the cancer incidence data online as part of their Cancer Surveillance Improvement Initiative, http://www.health.state.ny.us/nysdoh/cancer/csii/nyscsii.htm. These data represent newly diagnosed cancer cases in the period 1993–7 assigned to the patient's residence at diagnosis, and they are calculated as the number of cancers for each 100,000 people in the population. When we began this study (August 2001), the NYSDOH had released data on three cancers: breast (female only), colorectal (female and male), and lung (female and male) cancers.

To protect patient privacy, the NYSDOH data provided case counts referenced to ZIP codes rather than individual residences. While ZIP codes are somewhat arbitrary spatial units of analysis with respect to potential health and environmental factors, they provide a convenient way to group the population and preserve confidentiality. We combined this dataset with ZIP code boundary files, reflecting the geography in November 1999. We purchased the boundary files from Claritas Corporation http://www.claritas.com. While the NYSDOH provides information on the entire state, we focus on the 214 ZIP codes within Nassau, Queens and Suffolk County on Long Island.

People move between ZIP codes and cancer latency (the time between causative exposures and cancer onset) is long, so the ZIP code where the patient was diagnosed may not be the location where the cancer developed nor where causative exposures occurred. We do not include any adjustments for migration or changes in any demographic patterns within the study area.

While the observed cancer diagnosis data did adjust for different populations-at-risk in the different ZIP codes, we also used New York State's adjustment for different age patterns as well. Because cancer incidence is related to age, NYSDOH calculated the expected cancer incidence for each ZIP code using the ZIP code's age structure and the average incidence by age class for New York State. We calculated a standardized morbidity ratio (SMR) by dividing the observed value by the age-adjusted expected incidence. An SMR value of 1.0 indicates that the observed incidence is the same as expected, lower than 1.0 indicates that fewer than expected cases of cancer occurred, and greater than 1.0 indicates that more than expected occurred.

National Air Toxics Assessment

The USEPA National Air Toxics Assessment (NATA, http://www.epa.gov/ttn/atw/nata/) combines information on point and nonpoint emissions of air toxics and weather information into an Assessment System for Population Exposure Nationwide (ASPEN). We obtained ASPEN model 1996 base-year data (Feb 2001 run). The exposure data is approximately concurrent with the cancer study period, thereby precluding any cause-and-effect interpretation, as cancers developed in 1993 could not have been caused by air toxics in 1996. Because of the latency in the development of cancer, it would not even be plausible to say that the 1996 data could explain only 1997 diagnoses. Yet, the 1996 data may be representative of the air toxics prior to 1996, and 1996 is the first year such a comprehensive geographic exposure model was available from the USEPA. As this is an opportunistic analysis, we took the data available. We thereby assume the 1996 data are reasonable representations of air pollution in the preceding decade during which causative exposures might have occurred. This assumption seems reasonable for air pollution sources that have been in operation since the 1980s, and whose dispersal is mediated by transport mechanisms (e.g. prevailing winds) that haven't changed a great deal in the last 10–20 years. The ASPEN model estimates the average annual concentration of a series of known air toxics for all census tracts in the nation. We used concentrations of only those air toxics thought to be potential carcinogens for the three study cancers (Table 7). This list by no means constitutes an exhaustive list of potential carcinogens on Long Island. For the purposes of this study, the compounds in this list were deemed the most plausible carcinogens and exposure to these compounds was combined into a single risk measure, the overall predicted risk (OPR), defined below (Equation 2).

As exposure to each compound has a different risk, we standardized the exposure by multiplying the estimated average annual concentration of each compound by its Unit Risk Estimate (URE) as shown in Equation 1. The URE is the lifetime risk of excess cancer cases predicted to come from continuous exposure to a compound at a concentration of 1 μg/m³ in the air (for more information see definition on the NATA website, http://www.epa.gov/ttn/atw/nata/gloss1.html. UREs may under- or over-estimate the actual risk of exposure to these compounds, as the predictions are extrapolations from tests in animals and/or the effects of low doses. All UREs are from the Draft USEPA NATA report [5], except that for diesel particulate matter. The USEPA has not yet defined a URE for diesel and so we used the midpoint of the URE range from the California EPA [6]. To calculate exposure for each compound we used the following formula:

Exposure × URE = CancerRisk (Equation 1)

We obtained the annual estimated exposure from the NATA dataset, and used the URE values from Table 7 to obtain estimates of excess cancer cases due to that exposure. As the URE is a risk estimate for all cancer, rather than a cancer-specific figure, the OPR for each cancer is likely an overestimate of risk for an individual cancer.

The use of a national-scale assessment to predict cancer risk based on air toxics is subject to caveats that have been identified by the EPA:

"The UREs used in the national-scale assessment are subject to four major areas of variability and uncertainty. First, many of the pollutants were classified as probable carcinogens because data were not sufficient to prove causality in humans. It is possible that some of these pollutants do not cause cancer at environmentally relevant doses, and that true risk associated with these air toxics is zero. Second, all UREs in this study were based on linear extrapolation from high to low doses. It is possible that the true dose response relationships for some pollutants may be less than linear, resulting in an overestimate of risk. Third, most UREs in this study were developed from animal data using conservative methods to extrapolate between species. Human responses may differ from the predicted ones. The first three elements are comprised entirely of uncertainty. Fourth, most UREs in this study were based on statistical upper confidence limits, though some were based on statistical best fits. (While this does not affect overall uncertainty, UREs based on best fits should be unbiased, while those based on upper confidence limits should be biased high.) This fourth element represents a combination of variability (i.e., based on variation responses of different people or animals) and uncertainty (i.e., potential errors in the measurement of exposure and response). Because of the aggregate treatment all four sources of variability and uncertainty described above, EPA considers all its UREs to be upper-bound estimates."

pages 112–3, http://www.epa.gov/ttn/atw/sab/natareport.pdf

Regarding the use of UREs, one should note that the methods employed are sensitive to the relative, rather than absolute value, of the risk estimates. By focusing on boundaries, we are able to identify spatial structure and geographic associations so long as the relative values of the OPRs are correct. Hence the methods employed will yield the same results for a biased risk estimator – provided the bias on average is the same for all observed values. We also note the UREs in Table 7 are for all cancers combined, and not for site-specific cancers. Our analysis identified compounds thought to be carcinogens for each of the 5 site-specific cancers we considered, and then calculated a site-specific URE based on the values in Table 7. Ideally, one should use site-specific UREs, but these are not yet available from EPA.

We calculated an overall predicted risk from air toxics (OPR) for each cancer by summing up the excess cancer cases for each of the relevant compounds as shown in Equation 2.

Σ CancerRisk = OPR (Equation 2)

Summing up the excess cancer cases for all of the relevant compounds assumes an additive relationship – that particular compounds do not interact in a synergistic or threshold-related manner to influence dose-response relationships. The EPA is currently using an additive model for assessing dose and response to multiple compounds, but further research needs to be done to confirm the additive model or else replace it with a more appropriate model. Again, pattern recognition approaches are useful under this kind of uncertainty since they are relatively robust provided the rank order of the estimates is "about right."

Local Boundary and Subboundary Analysis

Borders where SMRs change a great deal may indicate areas where causative exposures change through geographic space, where SMRs are unstable, and/or where local populations differing in cancer incidence abut. The identification of such borders may provide insight into the causes, correlates and uncertainties in cancer incidence. To detect local boundaries we used the Womble [5] approach. Wombling identifies those locations with the highest local rates of change (measured by squared Euclidean distance between SMR values in adjacent ZIP codes). We used a gradient value threshold of 20%, so the top 20% of all local rates of change in the dataset were called boundaries. Wombling has been applied to raster data [6–8] and point data [9, 10]. It was extended to polygon data by Susan Maruca and Geoffrey Jacquez in the BoundarySeer software http://www.terraseer.com/boundaryseer.html. To our knowledge, this publication is the first application of this new wombling approach.

Because choosing a boundary threshold value is subjective, we evaluated the boundaries detected statistically through subboundary analysis [11]. For each defined set of boundaries we calculated the number of singletons Ns, mean boundary length Lmean, mean maximum boundary length Lmax, mean boundary diameter, and mean maximum boundary diameter. We will report Ns, Lmean, and Lmax. We then evaluated the probability of the observed value of each subboundary statistic against the null hypothesis of no spatial structure in the underlying variable (either SMR or OPR) through Monte Carlo randomizations. In these randomizations, the observed SMR values were randomized across the ZIP codes of Long Island. With equations 3 & 4, these statistics can be evaluated as excessively high (significant upper tail p-value or P↑, when the observed value is significantly higher than those in the reference distribution) or as excessively low (significant lower tail p-value or P↓, when the observed value is significantly lower than those in the reference distribution). Thus the statistics can be interpreted to identify statistical evidence of boundary cohesiveness (longer boundaries than expected by chance high Lmean and Lmax and low Ns) or fragmentation (shorter boundaries than expected by chance low Lmean and Lmax and low Ns – P↑).

Boundary Overlap Analysis

To assess the association between two sets of boundaries (e.g. cancer incidence (SMR) boundaries and cancer risk (OPR) boundaries) we used boundary overlap statistics [12]. We evaluated four statistics of boundary overlap based on the average minimum distance from boundaries in one variable (e.g. SMR) to the nearest boundary in the other variable (e.g. OPR). They are Os, Og, Ogh, and Oh. Os is the count of the number of boundary locations that are included in both sets of boundaries. Og is the mean distance from the boundaries of one variable (g) to the nearest boundary location in another boundary set (h). Oh is the mean distance from h to the nearest point in g. Ogh is the mean distance from locations in either boundary to the nearest location in the other

We obtained a p-value through equations 3 & 4 for the observed overlap by comparing the observed values of all four statistics to those generated by Monte Carlo randomizations. BoundarySeer randomized the variables considered (SMRs and/or OPR), recalculated the boundaries, and then recalculated the overlap statistics. The null hypothesis for this randomization approach is that boundaries in cancer are independent (not associated) with boundaries in cancer risk. Like the subboundary statistics, these overlap statistics can be evaluated as significantly closer (high Os, low Og, Oh, or Ogh) or significantly farther than expected by chance (low Os, high Og, Oh, or Ogh).

Because the SMR and OPR data were assigned to different geographic units (ZIP codes and census tracts, respectively) we would almost never expect overlap of SMR boundaries and OPR boundaries to result in significantly high Os, as Os depends on the exact location of the boundaries. The other statistics are minimum distances, and so are more reasonable measures of coincidence of two sets of boundaries detected on different geographic units (e.g. census tracts vs. ZIP codes).

Calculating p-values

Upper and lower p-values provide a sense of how extreme the observed values of the subboundary and overlap statistics are compared to the reference distribution of values obtained by randomization. The formulae for calculating these p-values are:

where Nruns is the total number of Monte Carlo simulations, NGE is the number of simulations greater than or equal to the observed value of the statistic, and NLE is the number of simulations less than or equal to the observed value of the statistic.

Results

Colorectal Cancer

Females

Boundaries in female colorectal cancer are shown in Figure 1. Our analysis identified those ZIP code edges where cancer incidence changes the most. In general, the boundaries in female colorectal cancer circumscribe or partially surround only one ZIP code. This pattern is consistent with the smaller-scale clustering found for females relative to males under the local Moran test [1]. For example, there were 3 clusters of female colorectal cancer, each comprised of from 3 to 4 ZIP codes [1, Table 1], vs. 5 clusters of male colorectal cancer, each comprised of from 3 to 7 ZIP codes [1, Table 2]. These smaller clusters indicate that geographic clustering of female colorectal cancer occurs at a smaller spatial scale than for males. This finding is further substantiated by subboundary analysis (Table 1), which found the boundaries in female colorectal cancer to be significantly fragmented. There are significantly more singleton boundaries (Ns larger than expected, P↑ = 0.032), while the boundaries are shorter (significantly low Lmean, P↓ = 0.032) than is expected under this null hypothesis.

Table 1 Colorectal cancer, subboundary statistics.

Full size table

Table 2 Colorectal cancer, overlap statistics.

Full size table

Table 3 Breast cancer, subboundary statistics.

Full size table

Males

Boundaries in male colorectal cancer are shown in Figure 2, and identify margins of ZIP codes that differ substantially in cancer incidence from their neighbors. These indicate not only the zones of high variation in cancer incidence that are expected at the margins of the significant clusters occurring under the local Moran analysis [1], but also highly local boundaries indicative of spatial variation in incidence at small spatial scales. Several boundaries appear to be long, and connect several ZIP codes; others are quite short and are comprised only of part of the margin of a ZIP code. Under subboundary analysis (Table 1), the boundaries, as a whole, were found to be neither significantly long nor significantly fragmented. We believe this result is consistent with an overall pattern of boundary fragmentation in western Long Island, and cohesive boundaries around the large-scale clusters occurring on mid- to eastern Long Island. This finding suggests that the determinants of colorectal cancer operate on increasingly larger spatial scales as one moves from west to east.

Males and females

Colorectal cancer has both dietary and genetic determinants, in addition to other risk factors such as age and smoking. Because diet is strongly influenced by the family environment, one might expect the incidence of male and female colorectal cancer to covary. To explore this expectation we generated bivariate plots of male vs. female cancer incidence (Figure 3), and also conducted a boundary overlap analysis. The scatter plot suggests little, if any, association between male and female colorectal cancer incidence.

Now consider the map of male and female colorectal boundaries (Figure 4). In several areas the female colorectal boundaries (yellow) overlap the male colorectal boundaries (orange) exactly, displayed as a yellow line with orange margins. These boundaries have significant exact overlap (Os P↑ = 0.012, Table 2). Further, the average minimum distance between the male and female colorectal boundaries was significantly smaller than expected (Ogh P↓ = 0.016). While the locations of male boundaries in colorectal cancer tended, on average, to occur near boundaries in female colorectal cancer (Og P↓ = 0.024) boundaries in female colorectal cancer were not necessarily near boundaries in male colorectal cancer (Oh P↓ = 0.152). These results are indicative of the ability of overlap statistics to detect common spatial variation patterns even when the observed correlation between two variables is weak. Despite the substantial 'noise' in the plot of male vs. female colorectal cancer incidence (Figure 3), overlap analysis revealed the geographic association expected given common exposures among males and females or attributable to similar diet and genetics within family units.

Colorectal Cancer – Analysis of NATA Data

The overall predicted risk for colorectal cancers was calculated from the NATA dataset and mapped (Figure 5). There is an outlier of high risk in census tract 11200, in the vicinity of Jamaica, Ozone Park, ZIP code 11416. Whether this is attributable to small population size, reporting differences, or other causes was not further explored in this study.

Subboundary analysis

Figure 6 is the map of boundaries in OPR for colorectal cancer, overlaid with boundaries in colorectal cancer risk (OPR) and boundaries in the incidence of male and female colorectal cancer. The boundaries in OPR are significant and cohesive, being longer and having fewer singletons than expected under the null hypothesis. The number of subboundaries were significantly fewer than expected (Ns P↓ = 0.004, Table 1). The boundaries were significantly long (Lmean P↑ = 0.004; Lmax P↑ = 0.004). In total, these results indicate boundaries that are significantly longer and more cohesive than is expected by chance, and suggests that spatial variation in OPR for colorectal cancer occurs on relatively large spatial scales. This outcome is consistent with the model used by EPA that incorporates both point- and area-sources into the air quality model.

Overlap analysis

Boundary overlap analysis determined whether zones of rapid change in OPR are significantly associated with boundaries in colorectal cancer SMR. If the air toxics modeled in the NATA database are indeed strongly associated with colorectal cancer risk, then we would expect boundaries in OPR to be significantly close to boundaries in colorectal cancer SMR. Accordingly, an overlap analysis was undertaken for both male and female colorectal cancers. For female colorectal cancer SMR we found overlap avoidance: boundaries in female colorectal cancer incidence are further from boundaries in colorectal OPR than is expected by chance (Table 2). Taking Long Island as a whole, the average minimum distance from a boundary in female colorectal cancer SMR to the nearest boundary in colorectal OPR is significantly larger (Og P↑ = 0.004) than its expected value. The same result obtains for males, where the average minimum distance from a boundary in male colorectal cancer SMR to the nearest boundary in colorectal is significantly larger (Og P↑ = 0.004) than its expected value.

When considering these statistical results with the maps in Figures 1 and 2, the source of overlap avoidance is apparent. The majority of boundaries in OPR are found in Western Long Island, in urban areas, suggesting that a greater number of point source emissions are causing greater spatial variation in OPR in more urban areas. In fact, the eastern-most boundary in colorectal OPR occurs in the vicinity of Greenlawn (ZIP 11740), while boundaries in both male and female colorectal cancer SMR are found as far east as Wainscott and Fishers Island. Thus while overlap avoidance occurs on the scale of Long Island as a whole, further investigation is needed to evaluate whether overlap occurs specifically in urban areas where there is a great deal of local scale variation in colorectal OPR.

Breast Cancer

We conducted a local boundary analysis (Figure 7) to identify the edges of ZIP codes where female breast cancer incidence changes rapidly. As a group, the breast cancer SMR subboundaries are not statistically remarkable. The number of subboundaries is near its expectation (Ns P↓ = 0.800). These subboundaries have lengths near what would be expected by chance (Lmean P↑ = 0.800,).

We found spatial pattern in breast cancer SMR at two distinct spatial scales. The first scale is at the level of the individual ZIP code, resulting in the spatial outliers described earlier [1]. The second scale occurs across adjacent ZIP codes, resulting in clusters of three or more ZIP codes found under the local Moran test [1]. The spatial scale of the pattern gives us some insights into the likely spatial scale of the generating process. For example, it seems unlikely that spatial outliers in SMR would result from a carcinogen, such as an airborne toxic, dispersed over a large geographic area. At the same time, we wouldn't expect a cluster of several counties to result from a highly localized exposure. Thus, if we are to use spatial pattern in breast cancer SMR as a clue to underlying causative exposures, we will need to consider exposure mechanisms that operate at both small (sub ZIP code) and local (bridging 3–7 ZIP codes) spatial scales.

Breast Cancer – Analysis of NATA Data

The overall predicted risk (OPR) for breast cancer was calculated from the NATA data set and mapped (Figure 8). We see a broad area of moderate to low overall predicted risk extending from west central to far eastern Long Island. Areas of moderate to high overall predicted risk are found in the western urban areas, especially in the vicinity of East Elmhurst, Maspeth, Long Island City and Little Neck in Flushing.

Subboundary Analysis

The boundaries in breast cancer OPR are significant and cohesive, being longer and having fewer singleton boundaries than expected by chance (Table 1). The number of subboundaries is significantly fewer than expected (Ns P↓ = 0.004). Both the mean and maximum boundary length were longer, on average, than expected by chance (Lmean P↑ = 0.004; Lmax P↑ = 0.004). These results indicate boundaries that are significantly longer and more cohesive than is expected by chance.

Overlap Analysis

Comparing Figure 9 to Figure 3 in [1], we located several areas of high OPR for breast cancer near the cluster of low breast cancer incidence identified in the local Moran analysis [1]. We also note the cluster of high breast cancer SMR found on southeastern Long Island is in an area of low OPR [1]. Based on map inspection, overall there appears to be a negative relationship between OPR and breast cancer incidence so that clusters of high breast cancer incidence occur where OPR is low, and clusters of low incidence occur where OPR is high. This result based on visual inspection is supported by the statistical analysis of boundary overlap (below).

Boundary overlap analysis was used to determine whether boundaries in breast cancer OPR were closer to boundaries in breast cancer SMR than one would expect were these variables independent. If breast cancer SMR indeed is increased in those locations where the airborne toxics underlying the OPR for breast cancer are elevated, then we should expect OPR and breast cancer SMR to have similar spatial patterns, and their boundaries should be significantly close to one another. We undertook a boundary overlap analysis to evaluate this hypothesis. Taking Long Island as a whole, the boundaries in breast cancer SMR are further away from boundaries in OPR than one would expect by chance (Figure 9, Table 4). The observed average minimum distance from a boundary in breast cancer SMR to a boundary in OPR was significantly higher than expected (Og P↑ = 0.004). Thus boundaries in breast cancer avoided boundaries in OPR such that one tends to find boundaries in breast cancer in locations where there aren't boundaries in OPR. This finding is consistent with the observed locations for the larger, multiple ZIP code breast cancer SMR clusters found under with the local Moran test [1], and the singleton clusters found by the local boundary analysis. There thus appears to be boundary avoidance, so that zones of rapid change in breast cancer incidence aren't found near zones of rapid change in OPR; and an inverse relationship between OPR and SMR, so that high values of breast cancer incidence tend not to be found where OPR is high.

Table 4 Breast cancer, overlap statistics.

Full size table