Integrative, normalization-insusceptible statistical analysis of RNA-Seq data, with improved differential expression and unbiased downstream functional analysis

Brief Bioinform. 2021 May 20;22(3):bbaa156. doi: 10.1093/bib/bbaa156.

Abstract

The study of differential gene expression patterns through RNA-Seq comprises a routine task in the daily lives of molecular bioscientists, who produce vast amounts of data requiring proper management and analysis. Despite widespread use, there are still no widely accepted golden standards for the normalization and statistical analysis of RNA-Seq data, and critical biases, such as gene lengths and problems in the detection of certain types of molecules, remain largely unaddressed. Stimulated by these unmet needs and the lack of in-depth research into the potential of combinatorial methods to enhance the analysis of differential gene expression, we had previously introduced the PANDORA P-value combination algorithm while presenting evidence for PANDORA's superior performance in optimizing the tradeoff between precision and sensitivity. In this article, we present the next generation of the algorithm along with a more in-depth investigation of its capabilities to effectively analyze RNA-Seq data. In particular, we show that PANDORA-reported lists of differentially expressed genes are unaffected by biases introduced by different normalization methods, while, at the same time, they comprise a reliable input option for downstream pathway analysis. Additionally, PANDORA outperforms other methods in detecting differential expression patterns in certain transcript types, including long non-coding RNAs.

Keywords: P-value combination; RNA-Seq; differential gene expression; long non-coding RNA normalization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Databases, Nucleic Acid*
  • Humans
  • RNA-Seq*
  • Software*