Assessing statistical power of SNPs for population structure and conservation studies

Phillip A Morin; Karen K Martien; Barbara L Taylor

doi:10.1111/j.1755-0998.2008.02392.x

Assessing statistical power of SNPs for population structure and conservation studies

Mol Ecol Resour. 2009 Jan;9(1):66-73. doi: 10.1111/j.1755-0998.2008.02392.x. Epub 2008 Oct 21.

Authors

Phillip A Morin¹, Karen K Martien, Barbara L Taylor

Affiliation

¹ Southwest Fisheries Science Center, 8604 La Jolla Shores Drive, La Jolla, CA 92037, USA.

PMID: 21564568
DOI: 10.1111/j.1755-0998.2008.02392.x

Abstract

Single nucleotide polymorphisms (SNPs) have been proposed by some as the new frontier for population studies, and several papers have presented theoretical and empirical evidence reporting the advantages and limitations of SNPs. As a practical matter, however, it remains unclear how many SNP markers will be required or what the optimal characteristics of those markers should be in order to obtain sufficient statistical power to detect different levels of population differentiation. We use a hypothetical case to illustrate the process of designing a population genetics project, and present results from simulations that address several issues for maximizing statistical power to detect differentiation while minimizing the amount of effort in developing SNPs. Results indicate that (i) while ~30 SNPs should be sufficient to detect moderate (F(ST) = 0.01) levels of differentiation, studies aimed at detecting demographic independence (e.g. F(ST) < 0.005) may require 80 or more SNPs and large sample sizes; (ii) different SNP allele frequencies have little affect on power, and thus, selection of SNPs can be relatively unbiased; (iii) increasing the sample size has a strong effect on power, so that the number of loci can be minimized when sample number is known, and increasing sample size is almost always beneficial; and (iv) power is increased by including multiple SNPs within loci and inferring haplotypes, rather than trying to use only unlinked SNPs. This also has the practical benefit of reducing the SNP ascertainment effort, and may influence the decision of whether to seek SNPs in coding or noncoding regions.