Selection for Cheaper Amino Acids Drives Nucleotide Usage at the Start of Translation in Eukaryotic Genes

Genomics Proteomics Bioinformatics. 2021 Dec;19(6):949-957. doi: 10.1016/j.gpb.2021.03.002. Epub 2021 Mar 17.

Abstract

Coding regions have complex interactions among multiple selective forces, which are manifested as biases in nucleotide composition. Previous studies have revealed a decreasing GC gradient from the 5'-end to 3'-end of coding regions in various organisms. We confirmed that this gradient is universal in eukaryotic genes, but the decrease only starts from the ∼ 25th codon. This trend is mostly found in nonsynonymous (ns) sites at which the GC gradient is universal across the eukaryotic genome. Increased GC contents at ns sites result in cheaper amino acids, indicating a universal selection for energy efficiency toward the N-termini of encoded proteins. Within a genome, the decreasing GC gradient is intensified from lowly to highly expressed genes (more and more protein products), further supporting this hypothesis. This reveals a conserved selective constraint for cheaper amino acids at the translation start that drives the increased GC contents at ns sites. Elevated GC contents can facilitate transcription but result in a more stable local secondary structure around the start codon and subsequently impede translation initiation. Conversely, the GC gradients at four-fold and two-fold synonymous sites vary across species. They could decrease or increase, suggesting different constraints acting at the GC contents of different codon sites in different species. This study reveals that the overall GC contents at the translation start are consequences of complex interactions among several major biological processes that shape the nucleotide sequences, especially efficient energy usage.

Keywords: Energy efficiency; Macroevolution; Prioritization of selective forces; Transcription; Translation initiation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids* / genetics
  • Base Composition
  • Codon / genetics
  • Eukaryota / genetics
  • Nucleotides* / genetics

Substances

  • Amino Acids
  • Codon
  • Nucleotides