To fully understand the role of microbiome in human health and diseases, researchers are increasingly interested in assessing the relationship between microbiome composition and host genomic data. The dimensionality of the data as well as complex relationships between microbiota and host genomics pose considerable challenges for analysis. In this article, we apply a kernel RV coefficient (KRV) test to evaluate the overall association between host gene expression and microbiome composition. The KRV statistic can capture nonlinear correlations and complex relationships among the individual data types and between gene expression and microbiome composition through measuring general dependency. Testing proceeds via a similar route as existing tests of the generalized RV coefficients and allows for rapid p-value calculation. Strategies to allow adjustment for confounding effects, which is crucial for avoiding misleading results, and to alleviate the problem of selecting the most favorable kernel are considered. Simulation studies show that KRV is useful in testing statistical independence with finite samples given the kernels are appropriately chosen, and can powerfully identify existing associations between microbiome composition and host genomic data while protecting type I error. We apply the KRV to a microbiome study examining the relationship between host transcriptome and microbiome composition within the context of inflammatory bowel disease and are able to derive new biological insights and provide formal inference on prior qualitative observations.
Keywords: Kernel; Microbiome composition; Multivariate association test; Omnibus test; RV coefficient.
© 2017, The International Biometric Society.