On integrative analysis of multi-level gene expression data in Kidney cancer subgrouping

Pratheeba Jeyananthan; Maduranga W P N; Rodrigo S M

doi:10.1177/03915603241304604

On integrative analysis of multi-level gene expression data in Kidney cancer subgrouping

Urologia. 2024 Dec 13:3915603241304604. doi: 10.1177/03915603241304604. Online ahead of print.

Authors

Pratheeba Jeyananthan¹, Maduranga W P N¹, Rodrigo S M¹

Affiliation

¹ Faculty of Engineering, University of Jaffna, Kilinochchi, Sri Lanka.

PMID: 39673207
DOI: 10.1177/03915603241304604

Abstract

Kidney cancer is one of the most dangerous cancer mainly targeting men. In 2020, around 430, 000 people were diagnosed with this disease worldwide. It can be divided into three prime subgroups such as kidney renal cell carcinoma (KIRC), kidney renal papilliary cell carcinoma (KIRP) and kidney chromophobe (KICH). Correct identification of these subgroups on time is crucial for the initiation and determination of proper treatment. On-time identification of this disease and its subgroup can help both the clinicians and patients to improve the situation. Hence, this study checks the possibility of using multi-omics data in the kidney cancer subgrouping, whether integrating multiple omics data will increase the subgrouping accuracy or not. Four different molecular data such as genomics, proteomics, epigenomics and miRNA from The Cancer Genome Atlas (TCGA) are used in this study. As the data is in a very high dimension world, this study starts with selecting the relevant features of the study using Pearson's correlation coefficient. Those selected features are used with three different classification algorithms such as k-nearest neighbor (KNN), supporting vector machines (SVMs) and random forest. Performances are compared to see whether the integration of multi-omics data can improve the accuracy of kidney cancer subgrouping. This study shows that integration of multi-omics data can improve the performance of the kidney cancer subgrouping. The highest performance (accuracy value of 0.98±0.03) is gained by top 400 features selected from integrated multi-omics data, with support vector machines.

Keywords: Cancer subgrouping; Feature selection; Integrated multi-omics data; Kidney cancer; Machine learning models.