Data Fusion-based Discovery (DAFdiscovery) pipeline to aid compound annotation and bioactive compound discovery across diverse spectral data

Phytochem Anal. 2023 Jan;34(1):48-55. doi: 10.1002/pca.3178. Epub 2022 Oct 3.

Abstract

Introduction: Data Fusion-based Discovery (DAFdiscovery) is a pipeline designed to help users combine mass spectrometry (MS), nuclear magnetic resonance (NMR), and bioactivity data in a notebook-based application to accelerate annotation and discovery of bioactive compounds. It applies Statistical Total Correlation Spectroscopy (STOCSY) and Statistical HeteroSpectroscopy (SHY) calculation in their data using an easy-to-follow Jupyter Notebook.

Method: Different case studies are presented for benchmarking, and the resultant outputs are shown to aid natural products identification and discovery. The goal is to encourage users to acquire MS and NMR data from their samples (in replicated samples and fractions when available) and to explore their variance to highlight MS features, NMR peaks, and bioactivity that might be correlated to accelerated bioactive compound discovery or for annotation-identification studies.

Results: Different applications were demonstrated using data from different research groups, and it was shown that DAFdiscovery reproduced their findings using a more straightforward method.

Conclusion: DAFdiscovery has proven to be a simple-to-use method for different situations where data from different sources are required to be analyzed together.

MeSH terms

  • Biological Products*
  • Magnetic Resonance Spectroscopy / methods
  • Mass Spectrometry / methods

Substances

  • Biological Products