coMOTIF: a mixture framework for identifying transcription factor and a coregulator motif in ChIP-seq data

Bioinformatics. 2011 Oct 1;27(19):2625-32. doi: 10.1093/bioinformatics/btr397. Epub 2011 Jul 19.

Abstract

Motivation: ChIP-seq data are enriched in binding sites for the protein immunoprecipitated. Some sequences may also contain binding sites for a coregulator. Biologists are interested in knowing which coregulatory factor motifs may be present in the sequences bound by the protein ChIP'ed.

Results: We present a finite mixture framework with an expectation-maximization algorithm that considers two motifs jointly and simultaneously determines which sequences contain both motifs, either one or neither of them. Tested on 10 simulated ChIP-seq datasets, our method performed better than repeated application of MEME in predicting sequences containing both motifs. When applied to a mouse liver Foxa2 ChIP-seq dataset involving ~ 12 000 400-bp sequences, coMOTIF identified co-occurrence of Foxa2 with Hnf4a, Cebpa, E-box, Ap1/Maf or Sp1 motifs in ~6-33% of these sequences. These motifs are either known as liver-specific transcription factors or have an important role in liver function.

Availability: Freely available at http://www.niehs.nih.gov/research/resources/software/comotif/.

Contact: li3@niehs.nih.gov

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Animals
  • Base Sequence
  • Binding Sites / genetics*
  • Chromatin Immunoprecipitation / methods
  • Gene Expression Regulation
  • Genome
  • Humans
  • Mice
  • Models, Genetic
  • Protein Binding* / genetics
  • Protein Structure, Tertiary
  • Sequence Analysis, DNA
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism

Substances

  • Transcription Factors