Quantifying Intensities of Transcription Factor-DNA Binding by Learning From an Ensemble of Protein Binding Microarrays

IEEE J Biomed Health Inform. 2021 Jul;25(7):2811-2819. doi: 10.1109/JBHI.2021.3058518. Epub 2021 Jul 27.

Abstract

The control of the coordinated expression of genes is primarily regulated by the interactions between transcription factors (TFs) and their DNA binding sites, which are an integral part of transcriptional regulatory networks. There are many computational tools focused on determining TF binding or unbinding to a DNA sequence. However, other tools focused on further determining the relative preference of such binding are needed. Here, we propose a regression model with deep learning, called SemanticBI, to predict intensities of TF-DNA binding. SemanticBI is a convolutional neural network (CNN)-recurrent neural network (RNN) architecture model that was trained on an ensemble of protein binding microarray data sets that covered multiple TFs. Using this approach, SemanticBI exhibited superior accuracy in predicting binding intensities compared to other popular methods. Moreover, SemanticBI uncovered vectorized sequence-oriented features using its CNN-RNN architecture, which is an abstract representation of the original DNA sequences. Additionally, the use of SemanticBI raises the question of whether motifs are necessary for computational models of TF binding. The online SemanticBI service can be accessed at http://qianglab.scst.suda.edu.cn/semantic/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Binding Sites
  • Computational Biology*
  • DNA / genetics
  • Humans
  • Protein Binding
  • Transcription Factors / genetics

Substances

  • Transcription Factors
  • DNA