PRESM: personalized reference editor for somatic mutation discovery in cancer genomics

Bioinformatics. 2019 May 1;35(9):1445-1452. doi: 10.1093/bioinformatics/bty812.

Abstract

Motivation: Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection.

Results: We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset.

Availability and implementation: The software is available at https://github.com/precisionomics/PRESM.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Mutation
  • Neoplasms* / genetics
  • Software