A hierarchical model for estimating significance levels of non-parametric linkage statistics for large pedigrees

Genet Epidemiol. 2007 Jul;31(5):417-30. doi: 10.1002/gepi.20222.

Abstract

The significance level of a non-parametric linkage (NPL) statistic is often found by simulation since the distribution of the test statistic is complex and unknown. Ideally, simulation occurs by randomly assigning founder genotypes and then simulating meiotic events for the descendants in the pedigree, commonly referred to as 'gene dropping'. The missing data pattern in the original pedigree, including lack of phase information (due to unordered genotypes), is then imposed on the simulated pedigree. However, this approach is usually computationally infeasible for larger pedigrees which require Markov chain Monte Carlo (MCMC) techniques to calculate the statistic, as an additional MCMC run is required to estimate the statistic for each gene drop. In this work, we propose a novel method to estimate the significance level of the NPL statistic in large pedigrees. This is accomplished by constructing a hierarchical model, which allows estimation of the NPL statistic variability via separate estimation of the Markov chain and gene dropping variability. The significance level is estimated by fitting a parametric model to the statistic and using the method of moments to obtain parameter estimates. In a simulation study we found our hierarchical model estimates to be very close to the gold standard empirical estimates, and offer substantial improvements over the existing conservative method used by the software SimWalk2. The estimation procedure significantly reduces the computational time relative to the ideal empirical estimate, allowing for an accurate estimate of the significance level in a more manageable amount of time.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Female
  • Genetic Linkage*
  • Genetic Markers
  • Genetic Predisposition to Disease*
  • Humans
  • Male
  • Markov Chains
  • Models, Genetic
  • Models, Statistical
  • Monte Carlo Method
  • Pedigree
  • Statistics, Nonparametric

Substances

  • Genetic Markers