Machine Learning Approach for Predicting Past Environmental Exposures From Molecular Profiling of Post-Exposure Human Serum Samples

J Occup Environ Med. 2019 Dec;61 Suppl 12(Suppl 12):S55-S64. doi: 10.1097/JOM.0000000000001692.

Abstract

Objective: To develop an approach for a retrospective analysis of post-exposure serum samples using diverse molecular profiles.

Methods: The 236 molecular profiles from 800 de-identified human serum samples from the Department of Defense Serum Repository were classified as smokers or non-smokers based on direct measurement of serum cotinine levels. A machine-learning pipeline was used to classify smokers and non-smokers from their molecular profiles.

Results: The refined supervised support vector machines with recursive feature elimination predicted smokers and non-smokers with 78% accuracy on the independent held-out set. Several of the identified classifiers of smoking status have previously been reported and four additional miRNAs were validated with experimental tobacco smoke exposure in mice, supporting the computational approach.

Conclusions: We developed and validated a pipeline that shows retrospective analysis of post-exposure serum samples can identify environmental exposures.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Animals
  • Biomarkers / blood
  • Cotinine / blood*
  • Disease Models, Animal
  • Environmental Exposure / statistics & numerical data*
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Mice, Inbred C57BL
  • Sex Factors
  • Smoking / adverse effects
  • Smoking / epidemiology
  • Support Vector Machine
  • Tobacco Smoke Pollution / adverse effects
  • Tobacco Smoke Pollution / statistics & numerical data
  • Young Adult

Substances

  • Biomarkers
  • Tobacco Smoke Pollution
  • Cotinine