INDELseek: detection of complex insertions and deletions from next-generation sequencing data

BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.

Abstract

Background: Complex insertions and deletions (indels) from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies. Somatic and germline complex indels in key disease driver genes could be missed in NGS-based genomics studies.

Results: INDELseek is an open-source complex indel caller designed for NGS data of random fragments and PCR amplicons. The key differentiating factor of INDELseek is that each NGS read alignment was examined as a whole instead of "pileup" of each reference position across multiple alignments. In benchmarking against the reference material NA12878 genome (n = 160 derived from high-confidence variant calls), GATK, SAMtools and INDELseek showed complex indel detection sensitivities of 0%, 0% and 100%, respectively. INDELseek also detected all known germline (BRCA1 and BRCA2) and somatic (CALR and JAK2) complex indels in human clinical samples (n = 8). Further experiments validated all 10 detected KIT complex indels in a discovery cohort of clinical samples. In silico semi-simulation showed sensitivities of 93.7-96.2% based on 8671 unique complex indels in >5000 genes from dbSNP and COSMIC. We also demonstrated the importance of complex indel detection in accurately annotating BRCA1, BRCA2 and TP53 mutations with gained or rescued protein-truncating effects.

Conclusions: INDELseek is an accurate and versatile tool for complex indel detection in NGS data. It complements other variant callers in NGS-based genomics studies targeting a wide spectrum of genetic variations.

Keywords: Bioinformatics; Complex indel; Next-generation sequencing; Variant calling.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Genomics / methods
  • Germ-Line Mutation
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • INDEL Mutation*
  • Neoplasms / genetics
  • Reproducibility of Results
  • Sequence Analysis, DNA
  • Software*