Transcriptome data are insufficient to control false discoveries in regulatory network inference

Cell Syst. 2024 Aug 21;15(8):709-724.e13. doi: 10.1016/j.cels.2024.07.006.

Abstract

Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.

Keywords: Markov random field; false discovery rate; gene regulatory network; knockoff filter; network inference; structure learning; transcription factor; transcriptional regulation.

MeSH terms

  • Chromatin Immunoprecipitation / methods
  • Gene Expression Profiling / methods
  • Gene Regulatory Networks* / genetics
  • Humans
  • Transcriptome* / genetics