Machine learning methods applied to genotyping data capture interactions between single nucleotide variants in late onset Alzheimer's disease

Alzheimers Dement (Amst). 2022 Apr 5;14(1):e12300. doi: 10.1002/dad2.12300. eCollection 2022.

Abstract

Introduction: Genome-wide association studies (GWAS) in late onset Alzheimer's disease (LOAD) provide lists of individual genetic determinants. However, GWAS do not capture the synergistic effects among multiple genetic variants and lack good specificity.

Methods: We applied tree-based machine learning algorithms (MLs) to discriminate LOAD (>700 individuals) and age-matched unaffected subjects in UK Biobank with single nucleotide variants (SNVs) from Alzheimer's disease (AD) studies, obtaining specific genomic profiles with the prioritized SNVs.

Results: MLs prioritized a set of SNVs located in genes PVRL2, TOMM40, APOE, and APOC1, also influencing gene expression and splicing. The genomic profiles in this region showed interaction patterns involving rs405509 and rs1160985, also present in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. rs405509 located in APOE promoter interacts with rs429358 among others, seemingly neutralizing their predisposing effect.

Discussion: Our approach efficiently discriminates LOAD from controls, capturing genomic profiles defined by interactions among SNVs in a hot-spot region.

Keywords: Apolipoprotein E; genetic determinants; genomic interactions; genomic profiles; late onset Alzheimer's disease; machine learning; single nucleotide variants; variant prioritization.