Fine-Scale Inference of Ancestry Segments Without Prior Knowledge of Admixing Groups

Genetics. 2019 Jul;212(3):869-889. doi: 10.1534/genetics.119.302139. Epub 2019 May 23.

Abstract

We present an algorithm for inferring ancestry segments and characterizing admixture events, which involve an arbitrary number of genetically differentiated groups coming together. This allows inference of the demographic history of the species, properties of admixing groups, identification of signatures of natural selection, and may aid disease gene mapping. The algorithm employs nested hidden Markov models to obtain local ancestry estimation along the genome for each admixed individual. In a range of simulations, the accuracy of these estimates equals or exceeds leading existing methods. Moreover, and unlike these approaches, we do not require any prior knowledge of the relationship between subgroups of donor reference haplotypes and the unseen mixing ancestral populations. Our approach infers these in terms of conditional "copying probabilities." In application to the Human Genome Diversity Project, we corroborate many previously inferred admixture events (e.g., an ancient admixture event in the Kalash). We further identify novel events such as complex four-way admixture in San-Khomani individuals, and show that Eastern European populations possess [Formula: see text] ancestry from a group resembling modern-day central Asians. We also identify evidence of recent natural selection favoring sub-Saharan ancestry at the human leukocyte antigen (HLA) region, across North African individuals. We make available an R and C++ software library, which we term MOSAIC (which stands for MOSAIC Organizes Segments of Ancestry In Chromosomes).

Keywords: admixture; demography; drift; population genetics; selection.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetic Drift
  • Genetics, Population / methods
  • HLA Antigens / genetics
  • Haplotypes
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Pedigree*
  • Population / genetics*
  • Software

Substances

  • HLA Antigens