SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

BMC Genomics. 2014 Oct 23;15(1):925. doi: 10.1186/1471-2164-15-925.

Abstract

Background: The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites.

Results: Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature.

Conclusions: SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Chromatin Immunoprecipitation
  • DNA / chemistry
  • DNA / metabolism
  • High-Throughput Nucleotide Sequencing
  • Internet
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism
  • RNA / chemistry
  • RNA / metabolism
  • Sequence Analysis, DNA
  • Software*
  • User-Computer Interface

Substances

  • Proteins
  • RNA
  • DNA