COSINE: non-seeding method for mapping long noisy sequences

Nucleic Acids Res. 2017 Aug 21;45(14):e132. doi: 10.1093/nar/gkx511.

Abstract

Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computational Biology / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Reproducibility of Results
  • Sequence Alignment / methods*
  • Software*