Background: Fetal hemoglobin (HbF) is the major modifier of the clinical course of sickle cell anemia. Its levels are highly heritable, and its interpersonal variability is modulated in part by 3 quantitative trait loci that affect HbF gene expression. Genome-wide association studies have identified single-nucleotide polymorphisms (SNPs) in these quantitative trait loci that are highly associated with HbF but explain only 10% to 12% of the variance of HbF. Combining SNPs into a genetic risk score can help to explain a larger amount of the variability of HbF level, but the challenge of this approach is to select the optimal number of SNPs to be included in the genetic risk score.
Methods and results: We developed a collection of 14 models with genetic risk score composed of different numbers of SNPs and used the ensemble of these models to predict HbF in patients with sickle cell anemia. The models were trained in 841 patients with sickle cell anemia and were tested in 3 independent cohorts. The ensemble of 14 models explained 23.4% of the variability in HbF in the discovery cohort, whereas the correlation between predicted and observed HbF in the 3 independent cohorts ranged between 0.28 and 0.44. The models included SNPs in BCL11A, the HBS1L-MYB intergenic region, and the site of the HBB gene cluster, quantitative trait loci previously associated with HbF.
Conclusions: An ensemble of 14 genetic risk models can predict HbF levels with accuracy between 0.28 and 0.44, and the approach may also prove useful in other applications.
Keywords: anemia, sickle cell; genetic association studies; genetics; hemoglobins; risk factors.