A broad survey of DNA sequence data simulation tools

Brief Funct Genomics. 2020 Jan 22;19(1):49-59. doi: 10.1093/bfgp/elz033.

Abstract

In silico DNA sequence generation is a powerful technology to evaluate and validate bioinformatics tools, and accordingly more than 35 DNA sequence simulation tools have been developed. With such a diverse array of tools to choose from, an important question is: Which tool should be used for a desired outcome? This question is largely unanswered as documentation for many of these DNA simulation tools is sparse. To address this, we performed a review of DNA sequence simulation tools developed to date and evaluated 20 state-of-art DNA sequence simulation tools on their ability to produce accurate reads based on their implemented sequence error model. We provide a succinct description of each tool and suggest which tool is most appropriate for the given different scenarios. Given the multitude of similar yet non-identical tools, researchers can use this review as a guide to inform their choice of DNA sequence simulation tool. This paves the way towards assessing existing tools in a unified framework, as well as enabling different simulation scenario analysis within the same framework.

Keywords: DNA sequence; bioinformatics tools; genomics; next generation sequence; simulation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Computer Simulation*
  • DNA / analysis*
  • DNA / genetics*
  • Genome, Human*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA