A generalist medical language model for disease diagnosis assistance

Nat Med. 2025 Jan 8. doi: 10.1038/s41591-024-03416-6. Online ahead of print.

Abstract

The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians' inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model's feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.