Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains

Hum Mol Genet. 2003 Nov 15;12(22):2967-79. doi: 10.1093/hmg/ddg329. Epub 2003 Sep 30.

Abstract

Mutations causing expansions of polyalanine domains are responsible for nine hereditary diseases. Other GC-rich sequences coding for some polyalanine domains were found to be polymorphic in human. These observations prompted us to identify all sequences in the human genome coding for polyalanine stretches longer than four alanines and establish their degree of polymorphism. We identified 494 annotated human proteins containing 604 polyalanine domains. Thirty-two percent (31/98) of tested sequences coding for more than seven alanines were polymorphic. The length of the polyalanine-coding sequence and its GCG or GCC repeat content are the major predictors of polymorphism. GCG codons are over-represented in human polyalanine coding sequences. Our data suggest that GCG and GCC codons play a key role in polyalanine-coding sequence appearance and polymorphism. The grouping by shared function of polyalanine-containing proteins in Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans shows that the majority are involved in transcriptional regulation. Phylogenetic analyses of HOX, GATA and EVX protein families demonstrate that polyalanine domains arose independently in different members of these families, suggesting that convergent molecular evolution may have played a role. Finally polyalanine domains in vertebrates are conserved between mammals and are rarer and shorter in Gallus gallus and Danio rerio. Together our results show that the polymorphic nature of sequences coding for polyalanine domains makes them prime candidates for mutations in hereditary diseases and suggests that they have appeared in many different protein families through convergent evolution.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Caenorhabditis elegans / genetics
  • Chickens / genetics
  • Codon
  • Conserved Sequence
  • Drosophila melanogaster / genetics
  • Evolution, Molecular*
  • Genes*
  • Genome, Human
  • Homeodomain Proteins
  • Humans
  • Peptides / chemistry*
  • Phylogeny
  • Polymorphism, Genetic*
  • Protein Structure, Tertiary
  • Repetitive Sequences, Amino Acid
  • Vertebrates / genetics
  • Zebrafish / genetics

Substances

  • Codon
  • Homeodomain Proteins
  • Peptides
  • polyalanine