Protein function prediction using guilty by association from interaction networks

Amino Acids. 2015 Dec;47(12):2583-92. doi: 10.1007/s00726-015-2049-3. Epub 2015 Jul 28.

Abstract

Protein function prediction from sequence using the Gene Ontology (GO) classification is useful in many biological problems. It has recently attracted increasing interest, thanks in part to the Critical Assessment of Function Annotation (CAFA) challenge. In this paper, we introduce Guilty by Association on STRING (GAS), a tool to predict protein function exploiting protein-protein interaction networks without sequence similarity. The assumption is that whenever a protein interacts with other proteins, it is part of the same biological process and located in the same cellular compartment. GAS retrieves interaction partners of a query protein from the STRING database and measures enrichment of the associated functional annotations to generate a sorted list of putative functions. A performance evaluation based on CAFA metrics and a fair comparison with optimized BLAST similarity searches is provided. The consensus of GAS and BLAST is shown to improve overall performance. The PPI approach is shown to outperform similarity searches for biological process and cellular compartment GO predictions. Moreover, an analysis of the best practices to exploit protein-protein interaction networks is also provided.

Keywords: CAFA; Gene ontology; Protein function; Protein interaction network; Protein sequence.

MeSH terms

  • Algorithms
  • Automation
  • Computational Biology
  • Data Mining
  • Databases, Protein
  • Genome, Fungal
  • Pattern Recognition, Automated
  • Protein Interaction Mapping*
  • Protein Interaction Maps*
  • Proteins / chemistry*
  • Reproducibility of Results
  • Software

Substances

  • Proteins