Inter-dinucleotide distances in the human genome: an analysis of the whole-genome and protein-coding distributions

J Integr Bioinform. 2011 Sep 15;8(3):172. doi: 10.2390/biecoll-jib-2011-172.

Abstract

We study the inter-dinucleotide distance distributions in the human genome, both in the whole-genome and protein-coding regions. The inter-dinucleotide distance is defined as the distance to the next occurrence of the same dinucleotide. We consider the 16 sequences of inter-dinucleotide distances and two reading frames. Our results show a period-3 oscillation in the protein-coding inter-dinucleotide distance distributions that is absent from the whole-genome distributions. We also compare the distance distribution of each dinucleotide to a reference distribution, that of a random sequence generated with the same dinucleotide abundances, revealing the CG dinucleotide as the one with the highest cumulative relative error for the first 60 distances. Moreover, the distance distribution of each dinucleotide is compared to the distance distribution of all other dinucleotides using the Kullback-Leibler divergence. We find that the distance distribution of a dinucleotide and that of its reversed complement are very similar, hence, the divergence between them is very small. This is an interesting finding that may give evidence of a stronger parity rule than Chargaff's second parity rule.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Genetic Variation / physiology*
  • Genome, Human / physiology*
  • Humans
  • Reading Frames / physiology*
  • Sequence Analysis, DNA / methods*