HaploMerger: reconstructing allelic relationships for polymorphic diploid genome assemblies

Genome Res. 2012 Aug;22(8):1581-8. doi: 10.1101/gr.133652.111. Epub 2012 May 3.

Abstract

Whole-genome shotgun assembly has been a long-standing issue for highly polymorphic genomes, and the advent of next-generation sequencing technologies has made the issue more challenging than ever. Here we present an automated pipeline, HaploMerger, for reconstructing allelic relationships in a diploid assembly. HaploMerger combines a LASTZ-ChainNet alignment approach with a novel graph-based structure, which helps to untangle allelic relationships between two haplotypes and guides the subsequent creation of reference haploid assemblies. The pipeline provides flexible parameters and schemes to improve the contiguity, continuity, and completeness of the reference assemblies. We show that HaploMerger produces efficient and accurate results in simulations and has advantages over manual curation when applied to real polymorphic assemblies (e.g., 4%-5% heterozygosity). We also used HaploMerger to analyze the diploid assembly of a single Chinese amphioxus (Branchiostoma belcheri) and compared the resulting haploid assemblies with EST sequences, which revealed that the two haplotypes are not only divergent but also highly complementary to each other. Taken together, we have demonstrated that HaploMerger is an effective tool for analyzing and exploiting polymorphic genome assemblies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Alleles*
  • Animals
  • Chordata, Nonvertebrate / genetics*
  • Computer Graphics
  • Computer Simulation
  • Diploidy*
  • Expressed Sequence Tags
  • Genetic Variation
  • Genome
  • Genomics / methods*
  • Haplotypes
  • Heterozygote
  • Reference Standards
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*