Background: Due to scientific advancements in high-throughput data production technologies, omics studies, such as genomics and metabolomics, often give rise to numerous measurements per sample/subject containing several noisy variables that potentially cloud the true signals relevant to the desired study outcome(s). Therefore, correcting for multiple testing is critical while performing any statistical test of significance to minimize the chances of false or missed discoveries. Such correction practice is commonplace in genome-wide association studies (GWAS) but is also becoming increasingly relevant to metabolome-wide association studies (MWAS). However, many existing procedures may be too conservative or too lenient, only assume a linear association between the features, or have not been evaluated on metabolomics data.
Methods: One such multiple testing correction strategy is to estimate the number of statistically independent tests, called the effective number of tests, based on the eigen-analysis of the correlation matrix between the features. This effective number is then used for a subsequent single-step adjustment to obtain the pointwise significance level. We propose a modification to the p-value adjustment based on a more general measure of association between two predictors, the distance correlation, with a specific focus on MWAS.
Results: We assessed common GWAS p-value adjustment procedures and one tailored for MWAS, which rely on eigen-analysis of the Pearson's correlation matrix. Our study, including varying sample size-to-feature ratios, response types, and metabolite groupings, highlights the superior performance of the distance correlation.
Conclusion: We propose the distance-correlation-based p-value adjustment (DisCo P-ad) as a novel modification that can enhance existing eigen-analysis-based multiple testing correction procedures by increasing power or reducing false positives. While our focus is on metabolomics, DisCo P-ad can also readily be applied to other high-dimensional omics studies.
Keywords: correlated tests; effective number of tests; eigen-analysis; metabolome-wide association study; multiple testing; pointwise error rate.