Bayesian analysis of population structure based on linked molecular information

Math Biosci. 2007 Jan;205(1):19-31. doi: 10.1016/j.mbs.2006.09.015. Epub 2006 Sep 28.

Abstract

The Bayesian model-based approach to inferring hidden genetic population structures using multilocus molecular markers has become a popular tool within certain branches of biology. In particular, it has been shown that heterogeneous data arising from genetically dissimilar latent groups of individuals can be effectively modelled using an unsupervised classification formulation. However, most currently employed models ignore potential linkage within the employed molecular information, and can therefore lead to biased inferences under certain circumstances. Utilizing the general theory of graphical models, we develop a framework that accounts for dependences both within linked molecular marker loci and DNA sequence data. Due to a high level of sequence conservation among eukaryotic species, the latter aspect is particularly relevant for analyzing rapidly evolving microbial species. The advantages of incorporating the dependence due to linkage in the classification models are illustrated by analyses of both simulated data and real samples of Bacillus cereus.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacillus cereus / genetics
  • Base Sequence
  • Bayes Theorem*
  • Computer Simulation
  • DNA, Bacterial / genetics
  • Genetics, Population / methods*
  • Models, Genetic*

Substances

  • DNA, Bacterial