Low hanging fruit: a subset of human cSNPs is both highly non-uniform and predictable

Gene. 2003 Jul 17:312:197-206. doi: 10.1016/s0378-1119(03)00628-0.

Abstract

We present a point mutation classification method that contrasts SNP databases and has the potential to illuminate the relative mutational load of genes caused by codon bias. We group point variation gleaned from public databases by their wild-type and mutant codons, e.g. codon mutation classes (CMCs, 576 possible such as ACG-->ATG), whose frequencies in a database are assembled into a BLOSUM-style matrix describing the likelihood of observing all possible single base codon changes as tuned by the intertwined effects of mutation rate and selection. The rankings of the CMCs in any database are reshuffled according to the population stratification of the typical genotyping experiment producing that resource's data. Analysis of four independent databases reveals that a considerable fraction of mutation in functional genes can be described by a few CMCs regardless of gene identity or population stratification in the genotyping experiment. For example, the top 5% (29/576) of CMCs account for 27.4% of the observed variants in dbSNP while the bottom 5% account for only 0.02%. For non-synonymous disease-causing mutation, 40.8% are described by the top 5% of all possible non-silent CMCs (22/438). Overall, the most observed polymorphism is a G-->A transition at CpG dinucleotides causing ACG, TCG, GCG, and CCG to frequently undergo silent mutation in any gene due to the putative lack of impact on the protein product. In order to assess how well CMC spectrums estimate the aggregate non-synonymous mutational trends of a single gene, a CMC matrix was applied to seven unrelated genes to compute the most likely point mutations. In excess of 87% of these mutation predictions are historically known to play an important role in a disease state according to published literature. CMC-based mutation prediction may aid design and execution of direct association genotyping studies.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence
  • Codon / genetics*
  • Codon, Nonsense
  • Databases, Nucleic Acid*
  • Gene Frequency
  • Genetic Diseases, Inborn / genetics
  • Humans
  • Mutation, Missense
  • Point Mutation
  • Polymorphism, Single Nucleotide*

Substances

  • Codon
  • Codon, Nonsense