ConFind: a robust tool for conserved sequence identification

Bioinformatics. 2005 Dec 15;21(24):4420-2. doi: 10.1093/bioinformatics/bti719. Epub 2005 Oct 20.

Abstract

Summary: ConFind (conserved region finder) identifies regions of conservation in multiple sequence alignments that can serve as diagnostic targets. Designed to work with a large number of closely related, highly variable sequences, ConFind provides robust handling of alignments containing partial sequences and ambiguous characters. Conserved regions are defined in terms of minimum region length, maximum informational entropy (variability) per position, number of exceptions allowed to the maximum entropy criterion and the minimum number of sequences that must contain a non-ambiguous character at a position to be considered for inclusion in a conserved region. Comparison of the calculated entropy for an alignment of 95 influenza A hemagglutinin sequences with random deletions results in a 98% reduction in the average error in ConFind relative to the 'Find Conserved Regions' option in BioEdit.

Requirements: ConFind requires Python 2.3, but Python 2.4 or an upgrade of the optparse module to Optik 1.5 is suggested. The program is known to run under Linux and DOS.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology
  • Conserved Sequence*
  • DNA, Viral / genetics
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics
  • Influenza A virus / genetics
  • Neuraminidase / genetics
  • Sequence Alignment / statistics & numerical data*
  • Sequence Homology, Nucleic Acid
  • Software*

Substances

  • DNA, Viral
  • Hemagglutinin Glycoproteins, Influenza Virus
  • Neuraminidase