Mucus obstruction is a central feature in the cystic fibrosis (CF) airways. A genome-wide association study (GWAS) of lung disease by the CF Gene Modifier Consortium (CFGMC) identified a significant locus containing two mucin genes, MUC20 and MUC4. Expression quantitative trait locus (eQTL) analysis using human nasal epithelia (HNE) from 94 CF-affected Canadians in the CFGMC demonstrated MUC4 eQTLs that mirrored the lung association pattern in the region, suggesting that MUC4 expression may mediate CF lung disease. Complications arose, however, with colocalization testing using existing methods: the locus is complex and the associated SNPs span a 0.2 Mb region with high linkage disequilibrium (LD) and evidence of allelic heterogeneity. We previously developed the Simple Sum (SS), a powerful colocalization test in regions with allelic heterogeneity, but SS assumed eQTLs to be present to achieve type I error control. Here we propose a two-stage SS (SS2) colocalization test that avoids a priori eQTL assumptions, accounts for multiple hypothesis testing and the composite null hypothesis, and enables meta-analysis. We compare SS2 to published approaches through simulation and demonstrate type I error control for all settings with the greatest power in the presence of high LD and allelic heterogeneity. Applying SS2 to the MUC20/MUC4 CF lung disease locus with eQTLs from CF HNE revealed significant colocalization with MUC4 (p = 1.31 × 10-5) rather than with MUC20. The SS2 is a powerful method to inform the responsible gene(s) at a locus and guide future functional studies. SS2 has been implemented in the application LocusFocus.
Keywords: SS2; colocalization; composite null hypothesis; cystic fibrosis; dependent samples; heterogeneity; lung disease; meta-analysis; mucin genes; multiple hypothesis testing; two-stage test.
Copyright © 2021 The Author(s). Published by Elsevier Inc. All rights reserved.