Computational chromatography: A machine learning strategy for demixing individual chemical components in complex mixtures

Proc Natl Acad Sci U S A. 2022 Dec 27;119(52):e2211406119. doi: 10.1073/pnas.2211406119. Epub 2022 Dec 19.

Abstract

Surface-enhanced Raman spectroscopy (SERS) holds exceptional promise as a streamlined chemical detection strategy for biological and environmental contaminants compared with current laboratory methods. Priority pollutants such as polycyclic aromatic hydrocarbons (PAHs), detectable in water and soil worldwide and known to induce multiple adverse health effects upon human exposure, are typically found in multicomponent mixtures. By combining the molecular fingerprinting capabilities of SERS with the signal separation and detection capabilities of machine learning (ML), we examine whether individual PAHs can be identified through an analysis of the SERS spectra of multicomponent PAH mixtures. We have developed an unsupervised ML method we call Characteristic Peak Extraction, a dimensionality reduction algorithm that extracts characteristic SERS peaks based on counts of detected peaks of the mixture. By analyzing the SERS spectra of two-component and four-component PAH mixtures where the concentration ratios of the various components vary, this algorithm is able to extract the spectra of each unknown component in the mixture of unknowns, which is then subsequently identified against a SERS spectral library of PAHs. Combining the molecular fingerprinting capabilities of SERS with the signal separation and detection capabilities of ML, this effort is a step toward the computational demixing of unknown chemical components occurring in complex multicomponent mixtures.

Keywords: machine learning; nanoparticles; nonnegative matrix factorization; polycyclic aromatic hydrocarbons; surface-enhanced Raman scattering.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Complex Mixtures
  • Environmental Pollutants* / analysis
  • Humans
  • Machine Learning
  • Polycyclic Aromatic Hydrocarbons* / analysis
  • Spectrum Analysis, Raman / methods
  • Water

Substances

  • Polycyclic Aromatic Hydrocarbons
  • Water
  • Environmental Pollutants
  • Complex Mixtures