Generating benchmarks for multiple sequence alignments and phylogenetic reconstructions

Proc Int Conf Intell Syst Mol Biol. 1997:5:303-6.

Abstract

We present a new probabilistic model of evolution of RNA-, DNA-, or protein-like sequences and a tool rose that implements this model. By insertion, deletion and substitution of characters, a family of sequences is created from a common ancestor. During this artificial evolutionary process, the "true" history is logged and the "correct" multiple sequence alignment is created simultaneously. We also allow for varying rates of mutation within the sequences making it possible to establish so-called sequence motifs. The results are suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Computer Simulation
  • Evaluation Studies as Topic
  • Evolution, Molecular
  • Models, Genetic*
  • Models, Statistical
  • Molecular Sequence Data
  • Mutation
  • Phylogeny*
  • Proteins / genetics
  • Sequence Alignment / methods*

Substances

  • Proteins