FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications

PLoS One. 2014 Apr 17;9(4):e94250. doi: 10.1371/journal.pone.0094250. eCollection 2014.

Abstract

Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Genetic
  • Gene Expression Profiling
  • Genome, Human / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA
  • Sequence Analysis, RNA
  • Time Factors

Grants and funding

This work was collectively supported by the National “973” Projects of China (2011CB910700), National Natural Science Foundation of China (31300649 and 31200612), the Key Project of Chinese Ministry of Education (212207), Guangdong Natural Science Foundation (S2013010013529), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (2012LYM_0026), the Fundamental Research Funds for the Central Universities (21612202, 21612459, 11610101, 21613343 and 21611201), and the Institutional Grant of Excellence of Jinan University, China (50625072). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.