One of the technical challenges encountered during metabolomics research is determining the chemical structures of unidentified peaks. We have developed a metabolomics-based chemoinformatics approach for ranking the candidate structures of unidentified peaks. Our approach uses information about the known metabolites detected in samples containing unidentified peaks and involves three discrete steps. The first step involves identifying "precursor/product metabolites" as potential reactants or products derived from the unidentified peaks. In the second step, candidate structures for the unidentified peak are searched against the PubChem database using a molecular formula. These structures are then ranked by structural similarity against precursor/product metabolites and candidate structures. In the third step, the migration time is predicted to refine the candidate structures. Two simulation studies were conducted to highlight the efficacy of our approach, including the use of 20 proteinogenic amino acids as pseudo-unidentified peaks, and leave-one-out experiments for all of the annotated metabolites with and without filtering against the Human Metabolome Database. We also applied our approach to two unidentified peaks in a urine sample, which were identified as glycocyamidine and N-acetylglycine. These results suggest that our approach could be used to identify unidentified peaks during metabolomics analysis.
Keywords: Chemoinformatics; Metabolomics; Migration time prediction; Tanimoto coefficient.
© 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.