Efficient analysis of large datasets and sex bias with ADMIXTURE

BMC Bioinformatics. 2016 May 23:17:218. doi: 10.1186/s12859-016-1082-x.

Abstract

Background: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data.

Results: We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension.

Conclusions: These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.

Keywords: Admixture; Ancestry inference; Pedigrees; Reference panels; Sex bias; Sex-chromosome; Supervised learning.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Black or African American / genetics
  • Female
  • Gene Frequency
  • Genetics, Population*
  • Genomics / methods*
  • HapMap Project
  • Humans
  • Male
  • Software*
  • Southwestern United States