Human-specific insertions and deletions inferred from mammalian genome sequences

Genome Res. 2007 Jan;17(1):16-22. doi: 10.1101/gr.5429606. Epub 2006 Nov 9.

Abstract

It has been suggested that insertions and deletions (indels) have contributed to the sequence divergence between the human and chimpanzee genomes more than do nucleotide changes (3% vs. 1.2%). However, although there have been studies of large indels between the two genomes, no systematic analysis of small indels (i.e., indels </= 100 bp) has been published. In this study, we first estimated that the false-positive rate of small indels inferred from human-chimpanzee pairwise sequence alignments is quite high, suggesting that the chimpanzee genome draft is not sufficiently accurate for our purpose. We have therefore inferred only human-specific indels using multiple sequence alignments of mammalian genomes. We identified >840,000 "small" indels, which affect >7000 UCSC-annotated human genes (>11,000 transcripts). These indels, however, amount to only approximately 0.21% sequence change in the human lineage for the regions compared, whereas in pseudogenes indels contribute to a sequence divergence of 1.40%, suggesting that most of the indels that occurred in genic regions have been eliminated. Functional analysis reveals that the genes whose coding exons have been affected by human-specific indels are enriched in transcription and translation regulatory activities but are underrepresented in catalytic and transporter activities, cellular and physiological processes, and extracellular region/matrix. This functional bias suggests that human-specific indels might have contributed to human unique traits by causing changes at the RNA and protein level.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Chromosomes, Human, Pair 21
  • Evolution, Molecular*
  • Gene Deletion*
  • Genes, Regulator
  • Genome
  • Genome, Human*
  • Humans
  • Mammals / genetics*
  • Molecular Sequence Data
  • Mutagenesis, Insertional*
  • Pan troglodytes / genetics
  • Protein Biosynthesis
  • Sequence Alignment
  • Transcription, Genetic