Detection of eukaryotic promoters using Markov transition matrices

Comput Chem. 1997;21(4):223-7. doi: 10.1016/s0097-8485(96)00040-x.

Abstract

Eukaryotic promoters are among the most important functional domains yet to be characterized in a satisfactory manner in genomic sequences. Most current detection methods rely on the recognition of individual transcription elements using position-weight matrices (PWM) or consensus sequences. Here, we study a simple promoter detection algorithm based on Markov transition matrices built from sequences upward from proven transcription initiation sites. The performances have been evaluated on the training set and on a test set of promoter-containing sequences. The results on the training set are surprisingly good, given that the algorithm does not incorporate any specific knowledge about promoters. Yet, the program exhibits the pathological behaviour typical of all training set-based methods: a significant decline in performance when confronted with previously unseen sequences. Thus, the Markov algorithm, like the others presently available, does not truly capture the essence of eukaryotic promoters. A detection program based on a Markov model is likely to be blind to categories of promoters without close representatives in the training set.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Computer Simulation
  • Consensus Sequence
  • Eukaryotic Cells
  • Genetic Techniques
  • Markov Chains*
  • Models, Genetic
  • Promoter Regions, Genetic*
  • TATA Box
  • Transcription, Genetic