BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming

Bioinformatics. 2013 Dec 15;29(24):3151-7. doi: 10.1093/bioinformatics/btt555. Epub 2013 Sep 23.

Abstract

Motivation: Prediction of protein residue contacts, even at the coarse-grain level, can help in finding solutions to the protein structure prediction problem. Unlike α-helices that are locally stabilized, β-sheets result from pairwise hydrogen bonding of two or more disjoint regions of the protein backbone. The problem of predicting contacts among β-strands in proteins has been addressed by several supervised computational approaches. Recently, prediction of residue contacts based on correlated mutations has been greatly improved and finally allows the prediction of 3D structures of the proteins.

Results: In this article, we describe BCov, which is the first unsupervised method to predict the β-sheet topology starting from the protein sequence and its secondary structure. BCov takes advantage of the sparse inverse covariance estimation to define β-strand partner scores. Then an optimization based on integer programming is carried out to predict the β-sheet connectivity. When tested on the prediction of β-strand pairing, BCov scores with average values of Matthews Correlation Coefficient (MCC) and F1 equal to 0.56 and 0.61, respectively, on a non-redundant dataset of 916 protein chains known with atomic resolution. Our approach well compares with the state-of-the-art methods trained so far for this specific task.

Availability and implementation: The method is freely available under General Public License at http://biocomp.unibo.it/savojard/bcov/bcov-1.0.tar.gz. The new dataset BetaSheet1452 can be downloaded at http://biocomp.unibo.it/savojard/bcov/BetaSheet1452.dat.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • DNA Mutational Analysis / methods*
  • Models, Statistical*
  • Mutation
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins