Population structure in a comprehensive genomic data set on human microsatellite variation

G3 (Bethesda). 2013 May 20;3(5):891-907. doi: 10.1534/g3.113.005728.

Abstract

Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.

Keywords: population structure; relatives; short tandem repeats.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alleles
  • Animals
  • Databases, Genetic*
  • Genetic Loci / genetics
  • Genetic Variation*
  • Genetics, Population*
  • Genome, Human / genetics*
  • Genomics*
  • Geography
  • Heterozygote
  • Humans
  • Microsatellite Repeats / genetics*
  • Pan troglodytes / genetics
  • Phylogeny
  • Population Dynamics
  • Reproducibility of Results
  • Sample Size