Background: The systemic information enclosed in microarray data encodes relevant clues to overcome the poorly understood combination of genetic and environmental factors in Parkinson's disease (PD), which represents the major obstacle to understand its pathogenesis and to develop disease-modifying therapeutics. While several gene prioritization approaches have been proposed, none dominate over the rest. Instead, hybrid approaches seem to outperform individual approaches.
Methods: A consensus strategy is proposed for PD related gene prioritization from mRNA microarray data based on the combination of three independent prioritization approaches: Limma, machine learning, and weighted gene co-expression networks.
Results: The consensus strategy outperformed the individual approaches in terms of statistical significance, overall enrichment and early recognition ability. In addition to a significant biological relevance, the set of 50 genes prioritized exhibited an excellent early recognition ability (6 of the top 10 genes are directly associated with PD). 40 % of the prioritized genes were previously associated with PD including well-known PD related genes such as SLC18A2, TH or DRD2. Eight genes (CCNH, DLK1, PCDH8, SLIT1, DLD, PBX1, INSM1, and BMI1) were found to be significantly associated to biological process affected in PD, representing potentially novel PD biomarkers or therapeutic targets. Additionally, several metrics of standard use in chemoinformatics are proposed to evaluate the early recognition ability of gene prioritization tools.
Conclusions: The proposed consensus strategy represents an efficient and biologically relevant approach for gene prioritization tasks providing a valuable decision-making tool for the study of PD pathogenesis and the development of disease-modifying PD therapeutics.