Greater power and computational efficiency for kernel-based association testing of sets of genetic variants

Christoph Lippert; Jing Xiang; Danilo Horta; Christian Widmer; Carl Kadie; David Heckerman; Jennifer Listgarten

doi:10.1093/bioinformatics/btu504

Greater power and computational efficiency for kernel-based association testing of sets of genetic variants

Bioinformatics. 2014 Nov 15;30(22):3206-14. doi: 10.1093/bioinformatics/btu504. Epub 2014 Jul 29.

Authors

Christoph Lippert¹, Jing Xiang¹, Danilo Horta¹, Christian Widmer¹, Carl Kadie¹, David Heckerman¹, Jennifer Listgarten¹

Affiliation

¹ eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA.

Abstract

Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods.

Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500.

Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/.

Contact: heckerma@microsoft.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Data Interpretation, Statistical
Genetic Association Studies / methods*
Genetic Variation*
Humans
Likelihood Functions
Phenotype
Polymorphism, Single Nucleotide

Abstract

Publication types

MeSH terms

Grants and funding