Tuning parameters for polygenic risk score methods using GWAS summary statistics from training data

Nat Commun. 2024 Jan 2;15(1):24. doi: 10.1038/s41467-023-44009-0.

Abstract

Various polygenic risk scores (PRS) methods have been proposed to combine the estimated effects of single nucleotide polymorphisms (SNPs) to predict genetic risks for common diseases, using data collected from genome-wide association studies (GWAS). Some methods require external individual-level GWAS dataset for parameter tuning, posing privacy and security-related concerns. Leaving out partial data for parameter tuning can also reduce model prediction accuracy. In this article, we propose PRStuning, a method that tunes parameters for different PRS methods using GWAS summary statistics from the training data. PRStuning predicts the PRS performance with different parameters, and then selects the best-performing parameters. Because directly using training data effects tends to overestimate the performance in the testing data, we adopt an empirical Bayes approach to shrinking the predicted performance in accordance with the genetic architecture of the disease. Extensive simulations and real data applications demonstrate PRStuning's accuracy across PRS methods and parameters.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Genetic Predisposition to Disease
  • Genetic Risk Score*
  • Genome-Wide Association Study* / methods
  • Humans
  • Multifactorial Inheritance / genetics
  • Polymorphism, Single Nucleotide
  • Risk Factors