Benchmarking network propagation methods for disease gene identification

PLoS Comput Biol. 2019 Sep 3;15(9):e1007276. doi: 10.1371/journal.pcbi.1007276. eCollection 2019 Sep.

Abstract

In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Benchmarking
  • Computational Biology / methods*
  • Computer Simulation*
  • Databases, Genetic
  • Disease / genetics
  • Drug Discovery / methods*
  • Humans
  • Machine Learning

Grants and funding

AP-L (grant numbers TEC2014-60337-R and DPI2017-89827-R) received funding from the Ministerio de Economía y Competitividad (http://www.mineco.gob.es/), Spain. AP-L and SP-A thank for funding the Spanish Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), initiative of Instituto de Investigacion Carlos III (ISCIII). SP-A thanks the AGAUR FI-scholarship programme. The funder did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.