Prediction of biogeographical ancestry from genotype: a comparison of classifiers

Int J Legal Med. 2017 Jul;131(4):901-912. doi: 10.1007/s00414-016-1504-3. Epub 2016 Dec 20.

Abstract

DNA can provide forensic intelligence regarding a donor's biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd's AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.

Keywords: Bayesian; Biogeographical ancestry (BGA); Genetic distance; Multinomial logistic regression; Phenotype prediction; STRUCTURE.

MeSH terms

  • Algorithms
  • Gene Frequency
  • Genealogy and Heraldry
  • Genetic Markers
  • Genotype*
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Polymorphism, Single Nucleotide
  • Racial Groups / genetics*

Substances

  • Genetic Markers