A three-state model for DNA protein-coding regions

IEEE Trans Biomed Eng. 2006 Nov;53(11):2148-55. doi: 10.1109/TBME.2006.879477.

Abstract

It is known that the protein-coding regions of DNA are usually characterized by a three-base periodicity. In this paper, we exploit this property, studying a DNA model based on three deterministic states, where each state implements a finite-context model. The experimental results obtained confirm the appropriateness of the proposed approach, showing compression gains in relation to the single finite-context model counterpart. Additionally, and potentially more interesting than the compression gain on its own, is the observation that the entropy associated to each of the three base positions of a codon differs and that this variation is not the same among the organisms analyzed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computer Simulation
  • DNA / genetics*
  • Models, Genetic*
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • Proteins
  • DNA