Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank

Nat Genet. 2024 Sep;56(9):1821-1831. doi: 10.1038/s41588-024-01898-1. Epub 2024 Sep 11.

Abstract

The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10-8) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).

MeSH terms

  • Algorithms
  • Biological Specimen Banks*
  • Biomarkers*
  • Case-Control Studies
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study* / methods
  • Humans
  • Machine Learning*
  • Multifactorial Inheritance / genetics
  • Multiomics
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Proteomics / methods
  • UK Biobank
  • United Kingdom

Substances

  • Biomarkers