Expansion of protein domain repeats

PLoS Comput Biol. 2006 Aug 25;2(8):e114. doi: 10.1371/journal.pcbi.0020114. Epub 2006 Jul 14.

Abstract

Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Conserved Sequence
  • Evolution, Molecular*
  • Humans
  • Molecular Sequence Data
  • Protein Structure, Tertiary / genetics*
  • Proteins / genetics*
  • Repetitive Sequences, Nucleic Acid / genetics*
  • Sequence Analysis / methods*

Substances

  • Proteins