Since its discovery, surface-enhanced Raman spectroscopy (SERS) has shown outstanding promise of identifying trace amounts of unknown molecules in rapid, portable formats. However, the many different types of nanoparticles or nanostructured metallic SERS substrates created over the past few decades show substantial variability in the SERS spectra they provide. These inconsistencies have even raised speculation that substrate-specific SERS spectral libraries must be compiled for practical use of this type of spectroscopy. Here, we report a machine learning (ML) algorithm that can identify chemicals by matching their SERS spectra to those of a standard Raman spectral library. We use an approach analogous to facial recognition that utilizes feature extraction in the presence of multiple nuisance variables for spectral recognition. The key element is a metric we call "Characteristic Peak Similarity" (CaPSim) that focuses on the characteristic peaks in the SERS spectra. It has the flexibility to accommodate substrate-specific variability when quantifying the degree of similarity to a Raman spectrum. Analysis shows that CaPSim substantially outperforms existing spectral matching algorithms in terms of accuracy. This ML-based approach could greatly facilitate the spectroscopic identification of molecules in fieldable SERS applications.
Keywords: characteristic peak similarity; machine learning; nanoparticles; polycyclic aromatic hydrocarbons; surface-enhanced Raman scattering.