Background: Studies investigating the population-mixing hypothesis in childhood leukemia principally use two analytical approaches: (1) nonrandom selection of areas according to specific characteristics, followed by comparisons of their incidence of childhood leukemia with that expected based on the national average; and (2) regression analyses of region-wide data to identify characteristics associated with the incidence of childhood leukemia. These approaches have generated contradictory results. We compare these approaches using observed and simulated data.
Methods: We generated 10,000 simulated regions using the correlation structure and distributions from a United Kingdom dataset. We simulated cases using a Poisson distribution with the incidence rate set to the national average assuming the null hypothesis that only population size drives the number of cases. Selection of areas within each simulated region was based on characteristics considered responsible for elevated infection rates (population density and inward migration) and/or elevated leukemia rates. We calculated effect estimates for 10,000 simulations and compared results to corresponding observed data analyses.
Results: When the selection of areas for analysis is based on apparent clusters of childhood leukemia, biased assessments occur; the estimated 5-year incidence of childhood leukemia ranged between zero and eight per 10,000 children in contrast to the simulated two cases per 10,000 children, similar to the observed data. Performing analyses on region-wide data avoids these biases.
Conclusions: Studies using nonrandom selection to investigate the association between childhood leukemia and population mixing are likely to have generated biased findings. Future studies can avoid such bias using a region-wide analytical strategy. See video abstract at, http://links.lww.com/EDE/B431.