Predictability of human differential gene expression

Megan Crow; Nathaniel Lim; Sara Ballouz; Paul Pavlidis; Jesse Gillis

doi:10.1073/pnas.1802973116

Predictability of human differential gene expression

Proc Natl Acad Sci U S A. 2019 Mar 26;116(13):6491-6500. doi: 10.1073/pnas.1802973116. Epub 2019 Mar 7.

Authors

Megan Crow¹, Nathaniel Lim^{2

3

4}, Sara Ballouz¹, Paul Pavlidis^{2

3}, Jesse Gillis⁵

Affiliations

¹ Stanley Center for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724.
² Department of Psychiatry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
³ Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
⁴ Genome Science and Technology Program, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
⁵ Stanley Center for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724; jgillis@cshl.edu.

Abstract

Differential expression (DE) is commonly used to explore molecular mechanisms of biological conditions. While many studies report significant results between their groups of interest, the degree to which results are specific to the question at hand is not generally assessed, potentially leading to inaccurate interpretation. This could be particularly problematic for metaanalysis where replicability across datasets is taken as strong evidence for the existence of a specific, biologically relevant signal, but which instead may arise from recurrence of generic processes. To address this, we developed an approach to predict DE based on an analysis of over 600 studies. A predictor based on empirical prior probability of DE performs very well at this task (mean area under the receiver operating characteristic curve, ∼0.8), indicating that a large fraction of DE hit lists are nonspecific. In contrast, predictors based on attributes such as gene function, mutation rates, or network features perform poorly. Genes associated with sex, the extracellular matrix, the immune system, and stress responses are prominent within the "DE prior." In a series of control studies, we show that these patterns reflect shared biology rather than technical artifacts or ascertainment biases. Finally, we demonstrate the application of the DE prior to data interpretation in three use cases: (i) breast cancer subtyping, (ii) single-cell genomics of pancreatic islet cells, and (iii) metaanalysis of lung adenocarcinoma and renal transplant rejection transcriptomics. In all cases, we find hallmarks of generic DE, highlighting the need for nuanced interpretation of gene phenotypic associations.

Keywords: differential expression; metaanalysis; replicability; specificity; transcriptomics.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adenocarcinoma / genetics
Biomarkers, Tumor / genetics
Breast Neoplasms / genetics
Electronic Data Processing
Female
Gene Expression Profiling*
Gene Expression Regulation*
Gene Regulatory Networks
Genes, Essential
Genomics
Graft Rejection
Human Genetics*
Humans
Kidney Transplantation
Lung Neoplasms
Probability*
ROC Curve
Recurrence
Sensitivity and Specificity
Transcriptome

Substances

Biomarkers, Tumor

Abstract

Publication types

MeSH terms

Substances

Grants and funding