Tradict enables accurate prediction of eukaryotic transcriptional states from 100 marker genes

Surojit Biswas; Konstantin Kerner; Paulo José Pereira Lima Teixeira; Jeffery L Dangl; Vladimir Jojic; Philip A Wigge

doi:10.1038/ncomms15309

Tradict enables accurate prediction of eukaryotic transcriptional states from 100 marker genes

Nat Commun. 2017 May 5:8:15309. doi: 10.1038/ncomms15309.

Authors

Surojit Biswas¹, Konstantin Kerner², Paulo José Pereira Lima Teixeira^{3

4}, Jeffery L Dangl^{3

4

5

6

7}, Vladimir Jojic⁸, Philip A Wigge⁹

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA.
² Botanical Institute, Biocenter, University of Cologne, D-50674 Cologne, Germany.
³ Howard Hughes Medical Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
⁴ Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
⁵ Carolina Center for Genome Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
⁶ Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
⁷ Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
⁸ Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
⁹ Sainsbury Laboratory, University of Cambridge, Cambridge CB2 1LR, UK.

Abstract

Transcript levels are a critical determinant of the proteome and hence cellular function. Because the transcriptome is an outcome of the interactions between genes and their products, it may be accurately represented by a subset of transcript abundances. We develop a method, Tradict (transcriptome predict), capable of learning and using the expression measurements of a small subset of 100 marker genes to predict transcriptome-wide gene abundances and the expression of a comprehensive, but interpretable list of transcriptional programs that represent the major biological processes and pathways of the cell. By analyzing over 23,000 publicly available RNA-Seq data sets, we show that Tradict is robust to noise and accurate. Coupled with targeted RNA sequencing, Tradict may therefore enable simultaneous transcriptome-wide screening and mechanistic investigation at large scales.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Animals
Arabidopsis / genetics
Arabidopsis / immunology
Computational Biology / methods*
Eukaryota / genetics*
Humans
Immunity, Innate / genetics
Signal Transduction
Transcription, Genetic*
Transcriptome / genetics

Abstract

Publication types

MeSH terms

Grants and funding