Optimized selection of unrelated subjects for whole-genome sequencing studies of rare high-penetrance alleles

Genet Epidemiol. 2012 Jul;36(5):472-9. doi: 10.1002/gepi.21641. Epub 2012 May 23.

Abstract

Sequencing studies using whole-genome or exome scans are still more expensive than genome-wide association studies on a per-subject basis. As a result, only a subset of subjects from a larger study will be selected for sequencing. To perform an agnostic investigation of the entire genome, subjects may be selected that capture independent ancestral lineages, i.e., founder genomes, and thus avoid redundant information from regions that were inherited identical by descent (IBD) from a common ancestor. We present SampleSeq2 that can be used to select a subset of optimally unrelated subjects with minimal IBD sharing. It also can be used to estimate the number, G(T), of founder chromosomes in a sample or select the minimum number of subjects that will carry a target G(T). We evaluated SampleSeq2 compared to a random draw of a small number of subjects both by simulation and using the Anabaptist genealogy. SampleSeq2 provided an increase in G(T) relative to a random draw across a range of small sample sizes. This increase in founder chromosomes improves the power of association tests, mitigates the effect of cryptic relatedness on parameter estimates, increases the total yield of alleles from sequencing, and minimizes the average size of regions shared IBD around disease alleles in cases.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles*
  • Amish
  • Case-Control Studies
  • Computer Simulation
  • Female
  • Genome
  • Genome, Human*
  • Genotype
  • Humans
  • Male
  • Models, Genetic
  • Models, Statistical
  • Pedigree
  • Penetrance*
  • Phylogeny
  • Research Design
  • Sequence Analysis, DNA