Protein contact prediction using patterns of correlation

Proteins. 2004 Sep 1;56(4):679-84. doi: 10.1002/prot.20160.

Abstract

We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids
  • Artificial Intelligence
  • Caspases / chemistry*
  • Cysteine Endopeptidases / chemistry*
  • Neural Networks, Computer
  • Predictive Value of Tests
  • Protein Interaction Mapping / methods*
  • Protein Structure, Secondary
  • Sequence Alignment / methods

Substances

  • Amino Acids
  • CASP5 protein, human
  • Caspases
  • Cysteine Endopeptidases