A comparative evaluation of genome assembly reconciliation tools

Genome Biol. 2017 May 18;18(1):93. doi: 10.1186/s13059-017-1213-3.

Abstract

Background: The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation.

Results: Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input.

Conclusions: None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly.

Keywords: Assembly reconciliation; De novo genome assembly; Genomics.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Chromosome Mapping / methods*
  • Contig Mapping / methods*
  • Eukaryota / genetics
  • Genome*
  • High-Throughput Nucleotide Sequencing
  • Prokaryotic Cells / metabolism
  • Sequence Analysis, DNA
  • Software / statistics & numerical data*