Tackling the Complexity of Spatial Transcriptomics Data Interpretation with Large Language Models

Taushif Khan; Colleen M Farley; John J Wilson; Chih-Hao Chang; Damien Chaussabel

doi:10.1101/2024.11.28.625773

Tackling the Complexity of Spatial Transcriptomics Data Interpretation with Large Language Models

bioRxiv [Preprint]. 2024 Dec 3:2024.11.28.625773. doi: 10.1101/2024.11.28.625773.

Authors

Taushif Khan, Colleen M Farley, John J Wilson, Chih-Hao Chang, Damien Chaussabel

Abstract

Spatial transcriptomics offers unprecedented insights into the complex cellular landscapes of tissues, particularly in cancer research where understanding the tumor microenvironment is crucial. However, interpreting the vast and intricate data generated by this technology remains a significant challenge. This study explores the potential of Large Language Models (LLMs) to assist in the analysis and interpretation of spatial transcriptomic data from a murine melanoma tumor model. We first evaluated the performance of multiple LLM models in describing and quantifying spatial gene expression patterns. Our benchmarking revealed that spatial transcriptomics data interpretation proved challenging for most models, with only a few demonstrating sufficient capability for this complex task. Using Claude 3.5 Sonnet, which showed the highest accuracy in spot quantification and pattern recognition, we developed a systematic workflow for analyzing the tumor immune landscape. The model first assisted in identifying and prioritizing panels of M1 and M2 macrophage-associated markers through knowledge-driven scoring. It then demonstrated remarkable ability to integrate spatial expression data with extensive immunological knowledge, providing sophisticated interpretation of local immune organization. When analyzing individual tumor regions, the model identified coordinated immunosuppressive mechanisms including metabolic barriers and disrupted pro-inflammatory signaling cascades, findings that both aligned with and extended current understanding of tumor immunology. This study highlights the potential of LLMs as powerful assistive tools in spatial transcriptomics analysis, capable of combining advanced pattern recognition with extensive knowledge integration to enhance data interpretation. While significant development work remains to make such workflows scalable, our proof of concept demonstrates that LLMs can help accelerate the translation of spatial transcriptomics data into biological insights.

Publication types

Preprint