Pr[m]: An Algorithm for Protein Motif Discovery

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):585-592. doi: 10.1109/TCBB.2020.2999262. Epub 2022 Feb 3.

Abstract

Motifs are the evolutionarily conserved patterns which are reported to serve the crucial structural and functional role. Identification of motif patterns in a set of protein sequences has been a prime concern for researchers in computational biology. The discovery of such a protein motif using existing algorithms is purely based on the parameters derived from sequence composition and length. However, the discovery of variable length motif remains a challenging task, as it is not possible to determine the length of a motif in advance. In current work, a k-mer based motif discovery approach called Pr[m], is proposed for the detection of the statistically significant un-gapped motif patterns, with or without wildcard characters. In order to analyze the performance of the proposed approach, a comparative study was performed with MEME and GLAM2, which are two widely used non-discriminative methods for motif discovery. A set of 7,500 test dataset were used to compare the performance of the proposed tool and the ones mentioned above. Pr[m] outperformed the existing methods in terms of predictive quality and performance. The proposed approach is hosted at https://bioserver.iiita.ac.in/Pr[m].

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Computational Biology*
  • Sequence Analysis, DNA