Base-resolution prediction of transcription factor binding signals by a deep learning framework

PLoS Comput Biol. 2022 Mar 9;18(3):e1009941. doi: 10.1371/journal.pcbi.1009941. eCollection 2022 Mar.

Abstract

Transcription factors (TFs) play an important role in regulating gene expression, thus the identification of the sites bound by them has become a fundamental step for molecular and cellular biology. In this paper, we developed a deep learning framework leveraging existing fully convolutional neural networks (FCN) to predict TF-DNA binding signals at the base-resolution level (named as FCNsignal). The proposed FCNsignal can simultaneously achieve the following tasks: (i) modeling the base-resolution signals of binding regions; (ii) discriminating binding or non-binding regions; (iii) locating TF-DNA binding regions; (iv) predicting binding motifs. Besides, FCNsignal can also be used to predict opening regions across the whole genome. The experimental results on 53 TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets show that our proposed framework outperforms some existing state-of-the-art methods. In addition, we explored to use the trained FCNsignal to locate all potential TF-DNA binding regions on a whole chromosome and predict DNA sequences of arbitrary length, and the results show that our framework can find most of the known binding regions and accept sequences of arbitrary length. Furthermore, we demonstrated the potential ability of our framework in discovering causal disease-associated single-nucleotide polymorphisms (SNPs) through a series of experiments.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Chromatin Immunoprecipitation Sequencing
  • Deep Learning*
  • Protein Binding
  • Transcription Factors / metabolism

Substances

  • Transcription Factors

Grants and funding

This work was supported by the grant of National Key R&D Program of China (No. 2018AAA0100100) and partly supported by National Natural Science Foundation of China (Grant nos. 62002266, 61861146002, 61732012, 61772370, 61932008, 61772357, and 62002297) and supported by “BAGUI Scholar” Program and the Scientific & Technological Base and Talent Special Program, GuiKe AD18126015 of the Guangxi Zhuang Autonomous Region of China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.