Protein glycosylation is associated with the pathogenesis of various cancers. The utilization of certain glycans in cancer diagnosis models holds promise, yet their accuracy is not always guaranteed. Here, we investigated the utility of deep learning techniques, specifically random forests combined with transfer learning, in enhancing serum glycome's discriminative power for cancer diagnosis (including ovarian cancer, non-small cell lung cancer, gastric cancer, and esophageal cancer). We started with ovarian cancer and demonstrated that transfer learning can achieve superior performance in data-disadvantaged cohorts (AUROC >0.9), outperforming the approach of PLS-DA. We identified a serum glycan-biomarker panel including 18 serum N-glycans and 4 glycan derived traits, most of which were featured with sialylation. Furthermore, we validated advantage of the transfer learning scheme across other cancer groups. These findings highlighted the superiority of transfer learning in improving the performance of glycans-based cancer diagnosis model and identifying cancer biomarkers, providing a new high-fidelity cancer diagnosis venue.
Keywords: Cancer; Glycomics; Machine learning.
© 2023 The Authors.