DMRIntTk: Integrating different DMR sets based on density peak clustering

PLoS One. 2024 Dec 23;19(12):e0315920. doi: 10.1371/journal.pone.0315920. eCollection 2024.

Abstract

Background: Identifying differentially methylated regions (DMRs) is a basic task in DNA methylation analysis. However, due to the different strategies adopted, different DMR sets will be predicted on the same dataset, which poses a challenge in selecting a reliable and comprehensive DMR set for downstream analysis.

Results: Here, we develop DMRIntTk, a toolkit for integrating DMR sets predicted by different methods on a same dataset. In DMRIntTk, the genome is segmented into bins, and the reliability of each DMR set at different methylation thresholds is evaluated. Then, the bins are weighted based on the covered DMR sets and integrated into final DMRs using a density peak clustering algorithm. To demonstrate the practicality of DMRIntTk, it was applied to different scenarios, including tissues with relatively large methylation differences, cancer tissues versus normal tissues with medium methylation differences, and disease tissues versus normal tissues with subtle methylation differences. Our results show that DMRIntTk can effectively trim regions with small methylation differences from the original DMR sets and thereby enriching the proportion of DMRs with larger methylation differences. In addition, the overlap analysis suggests that the integrated DMR sets are quite comprehensive, and functional analyses indicate the integrated disease-related DMRs are significantly enriched in biological pathways associated with the pathological mechanisms of the diseases. A comparative analysis of the integrated DMR set versus each original DMR set further highlights the superiority of DMRIntTk, demonstrating the unique biological insights it can provide.

Conclusions: Conclusively, DMRIntTk can help researchers obtain a reliable and comprehensive DMR set from many prediction methods.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • DNA Methylation*
  • Humans
  • Neoplasms / genetics
  • Software

Grants and funding

This work was supported in part by the Natural Science Foundation of Hunan Province (No. 2022JJ30694 and No. 2022JJ30750); Central South University Innovation-Driven Research Programme (No. 2023CXQD065); Special Funds for Construction of Innovative Provinces in Hunan Province (NO. 2023GK1010). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.