mitoSomatic: a tool for accurate identification of mitochondrial DNA somatic mutations without paired controls

Mol Oncol. 2023 May;17(5):857-871. doi: 10.1002/1878-0261.13335. Epub 2022 Dec 15.

Abstract

Mitochondrial DNA (mtDNA) somatic mutations play important roles in the initiation and progression of cancer. Although next-generation sequencing (NGS) of paired tumor and control samples has become a common practice to identify tumor-specific mtDNA mutations, the unique nature of mtDNA and NGS-associated sequencing bias could cause false-positive/-negative somatic mutation calling. Additionally, there are clinical scenarios where matched control tissues are unavailable for comparison. Therefore, a novel approach for accurately identifying somatic mtDNA variants is greatly needed, particularly in the absence of matched controls. In this study, the ground truth mtDNA variants orthogonally validated by triple-paired tumor, adjacent nontumor, and blood samples were used to develop mitoSomatic, a random forest-based machine learning tool. We demonstrated that mitoSomatic achieved area under the curve (AUC) values over 0.99 for identifying somatic mtDNA variants without paired control in three tumor types. In addition, mitoSomatic was also applicable in nontumor tissues such as adjacent nontumor and blood samples, suggesting the flexibility of mitoSomatic's classification capability. Furthermore, analysis of triple-paired samples identified a small group of variants with uncertain somatic/germline origin, whereas application of mitoSomatic significantly facilitated the prediction of their possible source. Finally, a control-free evaluation of the public pan-cancer NGS dataset with mitoSomatic revealed a substantial number of variants that were probably misclassified by conventional tumor-control comparison, further emphasizing the usefulness of mitoSomatic in application. Taken together, our study demonstrates that mitoSomatic is valuable for accurately identifying somatic mtDNA variants in mtDNA NGS data without paired controls, applicable for both tumor and nontumor tissues.

Keywords: machine learning; mitochondrial DNA; next-generation sequencing; somatic mutations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA, Mitochondrial* / genetics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Machine Learning
  • Mitochondria / genetics
  • Mutation / genetics
  • Neoplasms* / genetics

Substances

  • DNA, Mitochondrial