The frequency of oligonucleotides in mammalian genic regions

Comput Appl Biosci. 1989 Feb;5(1):33-40. doi: 10.1093/bioinformatics/5.1.33.

Abstract

The large body of nucleic acid sequence data now available offers a unique opportunity for the characterization of individual oligonucleotides which may be specific to sequence functional domains. We have prepared algorithms for the study of the frequency distribution of all oligonucleotides of length 2-6 in DNA sequences. We have implemented them in the study of 634 mammalian DNA sequences spanning 1.782 Mb, and have obtained the distribution of the ratio between the observed frequency of oligonucleotides and their expected frequency based on independent nucleotide probabilities. We then studied the distribution of oligonucleotides (or k-tuples) of each length in a subset of 129 complete mammalian genes spanning 0.607 Mb. Eight distinct genomic regions, namely 5'-non-transcribed, first exon, first intron, intermediate exons, intermediate introns, last intron, last exon and 3'-non-transcribed, were considered. We observed that some oligonucleotides show a statistical behaviour and a regional distribution similar to that of known signal sequences. Moreover the frequency distribution of oligonucleotides of length 5 and 6 tends to become bimodal, indicating the existence of a population of very frequent oligonucleotides.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • DNA*
  • Mammals / genetics*
  • Oligonucleotides / analysis*
  • Programming Languages

Substances

  • Oligonucleotides
  • DNA