EpiSmokEr: a robust classifier to determine smoking status from DNA methylation data

Epigenomics. 2019 Oct;11(13):1469-1486. doi: 10.2217/epi-2019-0206. Epub 2019 Aug 30.

Abstract

Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.

Keywords: DNA methylation; epigenetic smoking status; multinomial LASSO; smoking status classifier; tobacco smoking.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Computational Biology / methods
  • CpG Islands
  • DNA Methylation*
  • Epigenesis, Genetic
  • Epigenomics / methods*
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Software
  • Tobacco Smoking / genetics*