Dissolved organic matter (DOM) is a complex mixture of molecules that constitutes one of the largest reservoirs of organic matter on Earth. While stable carbon isotope values (δ13C) provide valuable insights into DOM transformations from land to ocean, it remains unclear how individual molecules respond to changes in DOM properties such as δ13C. To address this, we employed Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) to characterize the molecular composition of DOM in 510 samples from the China Coastal Environments, with 320 samples having δ13C measurements. Utilizing a machine learning model based on 5199 molecular formulas, we predicted δ13C values with a mean absolute error (MAE) of 0.30‰ on the training data set, surpassing traditional linear regression methods (MAE 0.85‰). Our findings suggest that degradation processes, microbial activities, and primary production regulate DOM from rivers to the ocean continuum. Additionally, the machine learning model accurately predicted δ13C values in samples without known δ13C values and in other published data sets, reflecting the δ13C trend along the land to ocean continuum. This study demonstrates the potential of machine learning to capture the complex relationships between DOM composition and bulk parameters, particularly with larger learning data sets and increasing molecular research in the future.
Keywords: DOM; FT-ICR MS; machine learning; stable carbon isotope; the China Coastal Environments.