Incorporating organelle correlations into semi-supervised learning for protein subcellular localization prediction

Bioinformatics. 2016 Jul 15;32(14):2184-92. doi: 10.1093/bioinformatics/btw219. Epub 2016 Apr 23.

Abstract

Motivation: Bioimages of subcellular protein distribution as a new data source have attracted much attention in the field of automated prediction of proteins subcellular localization. Performance of existing systems is significantly limited by the small number of high-quality images with explicit annotations, resulting in the small sample size learning problem. This limitation is more serious for the multi-location proteins that co-exist at two or more organelles, because it is difficult to accurately annotate those proteins by biological experiments or automated systems.

Results: In this study, we designed a new protein subcellular localization prediction pipeline aiming to deal with the small sample size learning and multi-location proteins annotation problems. Five semi-supervised algorithms that can make use of lower-quality data were integrated, and a new multi-label classification approach by incorporating the correlations among different organelles in cells was proposed. The organelle correlations were modeled by the Bayesian network, and the topology of the correlation graph was used to guide the order of binary classifiers training in the multi-label classification to reflect the label dependence relationship. The proposed protocol was applied on both immunohistochemistry and immunofluorescence images, and our experimental results demonstrated its efficiency.

Availability and implementation: The datasets and code are available at: www.csbio.sjtu.edu.cn/bioinf/CorrASemiB CONTACT: hbshen@sjtu.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Computational Biology / methods*
  • Fluorescent Antibody Technique
  • Humans
  • Immunohistochemistry
  • Organelles*
  • Protein Transport*
  • Supervised Machine Learning*