Workflow for Integrated Processing of Multicohort Untargeted 1H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology

Ibrahim Karaman; Diana L S Ferreira; Claire L Boulangé; Manuja R Kaluarachchi; David Herrington; Anthony C Dona; Raphaële Castagné; Alireza Moayyeri; Benjamin Lehne; Marie Loh; Paul S de Vries; Abbas Dehghan; Oscar H Franco; Albert Hofman; Evangelos Evangelou; Ioanna Tzoulaki; Paul Elliott; John C Lindon; Timothy M D Ebbels

doi:10.1021/acs.jproteome.6b00125

Workflow for Integrated Processing of Multicohort Untargeted ¹H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology

J Proteome Res. 2016 Dec 2;15(12):4188-4194. doi: 10.1021/acs.jproteome.6b00125. Epub 2016 Oct 6.

Authors

Ibrahim Karaman¹, Diana L S Ferreira¹, Claire L Boulangé², Manuja R Kaluarachchi², David Herrington³, Anthony C Dona^{2

4}, Raphaële Castagné¹, Alireza Moayyeri¹, Benjamin Lehne¹, Marie Loh¹, Paul S de Vries⁵, Abbas Dehghan⁵, Oscar H Franco⁵, Albert Hofman⁵, Evangelos Evangelou^{1

6}, Ioanna Tzoulaki^{1

6}, Paul Elliott¹, John C Lindon^{2

4}, Timothy M D Ebbels^{2

4}

Affiliations

¹ Medical Research Council - Public Health England (MRC-PHE) Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London , St. Mary's Campus, Norfolk Place, W2 1PG, London, United Kingdom.
² Metabometrix, Ltd. , Bioincubator Unit, Bessemer Building, Prince Consort Road, SW7 2BP South Kensington, London, United Kingdom.
³ Department of Internal Medicine, Wake Forest School of Medicine, Medical Center Boulevard , Winston-Salem, North Carolina 27157, United States.
⁴ Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London , Sir Alexander Fleming Building, South Kensington, SW7 2AZ London, United Kingdom.
⁵ Department of Epidemiology, Erasmus University, Erasmus Medical Center , Dr Molewaterplein 50, 3015 GE Rotterdam, Netherlands.
⁶ Department of Hygiene and Epidemiology, University of Ioannina School of Medicine , University Campus, P.O. Box 1186, 45110 Ioannina, Greece.

PMID: 27628670
DOI: 10.1021/acs.jproteome.6b00125

Abstract

Large-scale metabolomics studies involving thousands of samples present multiple challenges in data analysis, particularly when an untargeted platform is used. Studies with multiple cohorts and analysis platforms exacerbate existing problems such as peak alignment and normalization. Therefore, there is a need for robust processing pipelines that can ensure reliable data for statistical analysis. The COMBI-BIO project incorporates serum from ∼8000 individuals, in three cohorts, profiled by six assays in two phases using both ¹H NMR and UPLC-MS. Here we present the COMBI-BIO NMR analysis pipeline and demonstrate its fitness for purpose using representative quality control (QC) samples. NMR spectra were first aligned and normalized. After eliminating interfering signals, outliers identified using Hotelling's T² were removed and a cohort/phase adjustment was applied, resulting in two NMR data sets (CPMG and NOESY). Alignment of the NMR data was shown to increase the correlation-based alignment quality measure from 0.319 to 0.391 for CPMG and from 0.536 to 0.586 for NOESY, showing that the improvement was present across both large and small peaks. End-to-end quality assessment of the pipeline was achieved using Hotelling's T² distributions. For CPMG spectra, the interquartile range decreased from 1.425 in raw QC data to 0.679 in processed spectra, while the corresponding change for NOESY spectra was from 0.795 to 0.636, indicating an improvement in precision following processing. PCA indicated that gross phase and cohort differences were no longer present. These results illustrate that the pipeline produces robust and reproducible data, successfully addressing the methodological challenges of this large multifaceted study.

Keywords: NMR; alignment; epidemiology; large scale; metabolomics; multicohort; normalization; preprocessing; quality control.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Interpretation, Statistical*
Humans
Metabolomics / instrumentation
Metabolomics / methods*
Metabolomics / statistics & numerical data
Molecular Epidemiology
Proton Magnetic Resonance Spectroscopy / methods*
Proton Magnetic Resonance Spectroscopy / standards
Proton Magnetic Resonance Spectroscopy / statistics & numerical data
Quality Control
Reproducibility of Results
Workflow

Abstract

Publication types

MeSH terms

Grants and funding