SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

Federico Agostini; Davide Cirillo; Riccardo Delli Ponti; Gian Gaetano Tartaglia

doi:10.1186/1471-2164-15-925

SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences

BMC Genomics. 2014 Oct 23;15(1):925. doi: 10.1186/1471-2164-15-925.

Authors

Federico Agostini, Davide Cirillo, Riccardo Delli Ponti, Gian Gaetano Tartaglia¹

Affiliation

¹ Gene Function and Evolution, Centre for Genomic Regulation (CRG), C/ Dr, Aiguader 88, 08003 Barcelona, Spain. gian.tartaglia@crg.es.

Abstract

Background: The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites.

Results: Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature.

Conclusions: SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Base Sequence
Chromatin Immunoprecipitation
DNA / chemistry
DNA / metabolism
High-Throughput Nucleotide Sequencing
Internet
Protein Binding
Proteins / chemistry
Proteins / metabolism
RNA / chemistry
RNA / metabolism
Sequence Analysis, DNA
Software*
User-Computer Interface

Substances

Proteins
RNA
DNA