A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes

Bioinformatics. 2014 Mar 1;30(5):636-43. doi: 10.1093/bioinformatics/btt585. Epub 2013 Oct 9.

Abstract

Motivation: Cellular processes are controlled, directly or indirectly, by the binding of hundreds of different DNA binding factors (DBFs) to the genome. One key to deeper understanding of the cell is discovering where, when and how strongly these DBFs bind to the DNA sequence. Direct measurement of DBF binding sites (BSs; e.g. through ChIP-Chip or ChIP-Seq experiments) is expensive, noisy and not available for every DBF in every cell type. Naive and most existing computational approaches to detecting which DBFs bind in a set of genomic regions of interest often perform poorly, due to the high false discovery rates and restrictive requirements for prior knowledge.

Results: We develop SparScape, a penalized Bayesian method for identifying DBFs active in the considered regions and predicting a joint probabilistic binding landscape. Using a sparsity-inducing penalization, SparScape is able to select a small subset of DBFs with enriched BSs in a set of DNA sequences from a much larger candidate set. This substantially reduces the false positives in prediction of BSs. Analysis of ChIP-Seq data in mouse embryonic stem cells and simulated data show that SparScape dramatically outperforms the naive motif scanning method and the comparable computational approaches in terms of DBF identification and BS prediction.

Availability and implementation: SparScape is implemented in C++ with OpenMP (optional at compilation) and is freely available at 'www.stat.ucla.edu/∼zhou/Software.html' for academic use.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem
  • Binding Sites
  • Chromatin Immunoprecipitation / methods*
  • DNA-Binding Proteins / metabolism*
  • Embryonic Stem Cells / metabolism
  • Genomics
  • Mice
  • Promoter Regions, Genetic
  • Sequence Analysis, DNA / methods*
  • Software
  • Transcription Factors / metabolism*

Substances

  • DNA-Binding Proteins
  • Transcription Factors