A structure-based method for protein sequence alignment

Bioinformatics. 2005 Apr 15;21(8):1451-6. doi: 10.1093/bioinformatics/bti233. Epub 2004 Dec 21.

Abstract

Motivation: With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's Conserved Domain Database.

Results: In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTO's performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family.

Availability: A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer.

Contact: bryant@ncbi.nlm.nih.gov.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Databases, Protein
  • Models, Chemical*
  • Models, Molecular*
  • Molecular Sequence Data
  • Protein Conformation
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Software
  • Structure-Activity Relationship

Substances

  • Proteins