A hierarchical clustering and data fusion approach for disease subtype discovery

J Biomed Inform. 2021 Jan:113:103636. doi: 10.1016/j.jbi.2020.103636. Epub 2020 Nov 30.

Abstract

Recent advances in multi-omics clustering methods enable a more fine-tuned separation of cancer patients into clinical relevant clusters. These advancements have the potential to provide a deeper understanding of cancer progression and may facilitate the treatment of cancer patients. Here, we present a simple hierarchical clustering and data fusion approach, named HC-fused, for the detection of disease subtypes. Unlike other methods, the proposed approach naturally reports on the individual contribution of each single-omic to the data fusion process. We perform multi-view simulations with disjoint and disjunct cluster elements across the views to highlight fundamentally different data integration behavior of various state-of-the-art methods. HC-fused combines the strengths of some recently published methods and shows superior performance on real world cancer data from the TCGA (The Cancer Genome Atlas) database. An R implementation of our method is available on GitHub (pievos101/HC-fused).

Keywords: Disease subtyping; Integrative clustering; Multi-omics; Multi-view clustering.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Databases, Factual
  • Humans
  • Neoplasms* / genetics