Purpose: A reliable and comprehensive cancer prognosis model for clear cell renal cell carcinoma (ccRCC) could better assist in personalizing treatment. In this work, we developed a multi-modal ensemble model (MMEM) which integrates pretreatment clinical information, multi-omics data, and histopathology whole slide image (WSI) data to learn complementary information to predict overall survival (OS) and disease-free survival (DFS) for patients with ccRCC.
Methods and materials: We collected 226 patients from The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma dataset (TCGA-KIRC). These patients have OS and DFS follow up data available and five different data modalities provided, including clinical information, pathology data in the form of WSI, and three multi-omics data, which comprise mRNA expression, miRNA expression (miRSeq), and DNA methylation data. Five sets of separate survival prediction models were constructed separately for OS and DFS. We used a traditional Cox-proportional hazards (CPH) model with iterative forward feature selection for clinical and multi-omics data. Four different types of pre-trained encoder models, comprising ResNet and three recently developed general purpose foundation models for computational pathology, were utilized to extract features from processed WSI patches. A deep learning-based CPH model was constructed to predict survival outcomes using these encoded WSI features. For each of the survival outcomes of interest, we weigh and combine the predicted risk scores from all the five models to generate the final prediction. Model weighting was based on the training performance. Five-fold cross validation was performed to train and test the proposed workflow.
Results: We employed the concordance index (C-index) and area under the receiver operating characteristic curve (AUROC) metrics to assess the performance of our models for time-to-event prediction and time-specific binary prediction, respectively. Among the sub-models, the clinical feature based CPH model has the highest weight for both prediction tasks. For WSI-based prediction, the encoded feature using an image-based general purpose foundation model (UNI) showed the best prediction performance over other pretrained feature encoders. Our final model outperformed corresponding single-modality models on all prediction labels, achieving C-indices of 0.820 and 0.833 for OS and DFS, respectively. The AUROC values for binary prediction at follow-up of 3 year were 0.831 and 0.862 for patient death and cancer recurrence, respectively. Using the medians of predicted risks as thresholds to identify high-risk and low-risk patient groups, we performed log-rank tests, which revealed improved performance in both OS and DFS compared to single-modality models.
Conclusion: We developed the first multi-modal prediction model MMEM for ccRCC patients that integrates features across five different data modalities. Our model demonstrated better prognostic ability compared with corresponding single-modality models for both prediction targets. If findings are independently reproduced, it has the potential to assist in management of ccRCC patients.