An integrative machine learning approach to discovering multi-level molecular mechanisms of obesity using data from monozygotic twin pairs

R Soc Open Sci. 2020 Oct 21;7(10):200872. doi: 10.1098/rsos.200872. eCollection 2020 Oct.

Abstract

We combined clinical, cytokine, genomic, methylation and dietary data from 43 young adult monozygotic twin pairs (aged 22-36 years, 53% female), where 25 of the twin pairs were substantially weight discordant (delta body mass index > 3 kg m-2). These measurements were originally taken as part of the TwinFat study, a substudy of The Finnish Twin Cohort study. These five large multivariate datasets (comprising 42, 71, 1587, 1605 and 63 variables, respectively) were jointly analysed using an integrative machine learning method called group factor analysis (GFA) to offer new hypotheses into the multi-molecular-level interactions associated with the development of obesity. New potential links between cytokines and weight gain are identified, as well as associations between dietary, inflammatory and epigenetic factors. This encouraging case study aims to enthuse the research community to boldly attempt new machine learning approaches which have the potential to yield novel and unintuitive hypotheses. The source code of the GFA method is publically available as the R package GFA.

Keywords: big data; machine learning; monozygotic twins; obesity.

Associated data

  • figshare/10.6084/m9.figshare.c.5177386