Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences

Yen Hock Tan; He Huang; Daisuke Kihara

doi:10.1002/prot.21020

Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences

Proteins. 2006 Aug 15;64(3):587-600. doi: 10.1002/prot.21020.

Authors

Yen Hock Tan¹, He Huang, Daisuke Kihara

Affiliation

¹ Department of Computer Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA. dkihara@purdue.edu

PMID: 16799934
DOI: 10.1002/prot.21020

Abstract

Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Amino Acids / chemistry*
Amino Acids / genetics
Computational Biology / methods
Databases, Protein / statistics & numerical data
Protein Folding
Proteins / chemistry*
Proteins / genetics
Reproducibility of Results
Sequence Alignment / methods*
Sequence Alignment / statistics & numerical data
Software

Substances

Amino Acids
Proteins

Grants and funding

R01 GM-075004/GM/NIGMS NIH HHS/United States