Background: Reliable risk prediction tools for estimating individual probability of lung cancer have important public health implications. We constructed and validated a comprehensive clinical tool for lung cancer risk prediction by smoking status.
Methods: Epidemiologic data from 1851 lung cancer patients and 2001 matched control subjects were randomly divided into separate training (75% of the data) and validation (25% of the data) sets for never, former, and current smokers, and multivariable models were constructed from the training sets. The discriminatory ability of the models was assessed in the validation sets by examining the areas under the receiver operating characteristic curves and with concordance statistics. Absolute 1-year risks of lung cancer were computed using national incidence and mortality data. An ordinal risk index was constructed for each smoking status category by summing the odds ratios from the multivariable regression analyses for each risk factor.
Results: All variables that had a statistically significant association with lung cancer (environmental tobacco smoke, family history of cancer, dust exposure, prior respiratory disease, and smoking history variables) have strong biologically plausible etiologic roles in the disease. The concordance statistics in the validation sets for the never, former, and current smoker models were 0.57, 0.63, and 0.58, respectively. The computed 1-year absolute risk of lung cancer for a hypothetical male current smoker with an estimated relative risk close to 9 was 8.68%. The ordinal risk index performed well in that true-positive rates in the designated high-risk categories were 69% and 70% for current and former smokers, respectively.
Conclusions: If confirmed in other studies, this risk assessment procedure could use easily obtained clinical information to identify individuals who may benefit from increased screening surveillance for lung cancer. Although the concordance statistics were modest, they are consistent with those from other risk prediction models.