Spatial transcriptomics has significantly advanced the measurement of spatial gene expression in the field of biology. However, the high cost of ST limits its application in large-scale studies. Using deep learning to predict spatial gene expression from H&E-stained histology images offers a more cost-effective alternative, but existing methods fail to fully leverage the multimodal information provided by Spatial transcriptomics and pathology images. In response, this paper proposes STMCL, a novel multimodal contrastive learning framework. STMCL integrates multimodal information, including histology images, gene expression features of spots, and their locations, to accurately infer spatial gene expression profiles. We tested four different types of multi-slice spatial transcriptomics datasets generated by the 10X Genomics platform. The results indicate that STMCL has advantages over baseline methods in predicting spatial gene expression profiles. Furthermore, STMCL is capable of capturing cancer-specific highly expressed genes and preserving gene expression patterns while maintaining the original spatial structure of gene expression. Our code is available at https://github.com/wenwenmin/STMCL.
Keywords: Contrastive Learning; Gene Expression Prediction; Multi-modal Deep Learning; Spatial Transcriptomics.
Copyright © 2025 Elsevier Inc. All rights reserved.