Weighted SNP set analysis in genome-wide association study

PLoS One. 2013 Sep 30;8(9):e75897. doi: 10.1371/journal.pone.0075897. eCollection 2013.

Abstract

Genome-wide association studies (GWAS) are popular for identifying genetic variants which are associated with disease risk. Many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously which considering disadvantages of methods in single locus association analysis. Kernel machine based SNP set analysis is more powerful than single locus analysis, which borrows information from SNPs correlated with causal or tag SNPs. Four types of kernel machine functions and principal component based approach (PCA) were also compared. However, given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and used a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in the case of different minor allele frequencies (MAF) and linkage disequilibrium (LD) structures. We also applied the three methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. Simulation results show that when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and other kernel machine functions at different LD structures and different numbers of causal SNPs. Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS have better performance than the linear kernel, IBS kernel and PCA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Asian People
  • Carcinoma, Non-Small-Cell Lung / genetics*
  • Data Interpretation, Statistical
  • Gene Frequency
  • Genome-Wide Association Study / methods*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Polymorphism, Single Nucleotide / genetics*
  • Principal Component Analysis / methods

Grants and funding

This work was supported by the National Natural Science Foundation of China (NSFC81072389 to FC, NSFC30901232 to YZ), Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China (11KJA330001 and 10KJA33034), the Research Fund for the Doctoral Program of Higher Education of China (2011323411002), the Research and the Innovation Project for College Graduates of Jiangsu Province (CXZZ11_0733) and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). The funders had no role in study design, data collection and analysis, decision to publish, or prepararion of the manuscript.