A medical multimodal large language model for future pandemics

Fenglin Liu; Tingting Zhu; Xian Wu; Bang Yang; Chenyu You; Chenyang Wang; Lei Lu; Zhangdaihong Liu; Yefeng Zheng; Xu Sun; Yang Yang; Lei Clifton; David A Clifton

doi:10.1038/s41746-023-00952-2

A medical multimodal large language model for future pandemics

NPJ Digit Med. 2023 Dec 2;6(1):226. doi: 10.1038/s41746-023-00952-2.

Authors

Fenglin Liu¹, Tingting Zhu², Xian Wu³, Bang Yang⁴, Chenyu You⁵, Chenyang Wang², Lei Lu², Zhangdaihong Liu^{2

6}, Yefeng Zheng³, Xu Sun⁴, Yang Yang⁷, Lei Clifton⁸, David A Clifton^{9

10}

Affiliations

¹ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK. fenglin.liu@eng.ox.ac.uk.
² Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
³ Jarvis Research Center, Tencent YouTu Lab, Beijing, China.
⁴ School of Computer Science, Peking University, Beijing, China.
⁵ Yale University, New Haven, CT, USA.
⁶ Oxford-Suzhou Centre for Advanced Research, Suzhou, China.
⁷ School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
⁸ Nuffield Department of Population Health, University of Oxford, Oxford, UK.
⁹ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK. david.clifton@eng.ox.ac.uk.
¹⁰ Oxford-Suzhou Centre for Advanced Research, Suzhou, China. david.clifton@eng.ox.ac.uk.

Abstract

Deep neural networks have been integrated into the whole clinical decision procedure which can improve the efficiency of diagnosis and alleviate the heavy workload of physicians. Since most neural networks are supervised, their performance heavily depends on the volume and quality of available labels. However, few such labels exist for rare diseases (e.g., new pandemics). Here we report a medical multimodal large language model (Med-MLLM) for radiograph representation learning, which can learn broad medical knowledge (e.g., image understanding, text semantics, and clinical phenotypes) from unlabelled data. As a result, when encountering a rare disease, our Med-MLLM can be rapidly deployed and easily adapted to them with limited labels. Furthermore, our model supports medical data across visual modality (e.g., chest X-ray and CT) and textual modality (e.g., medical report and free-text clinical note); therefore, it can be used for clinical tasks that involve both visual and textual data. We demonstrate the effectiveness of our Med-MLLM by showing how it would perform using the COVID-19 pandemic "in replay". In the retrospective setting, we test the model on the early COVID-19 datasets; and in the prospective setting, we test the model on the new variant COVID-19-Omicron. The experiments are conducted on 1) three kinds of input data; 2) three kinds of downstream tasks, including disease reporting, diagnosis, and prognosis; 3) five COVID-19 datasets; and 4) three different languages, including English, Chinese, and Spanish. All experiments show that our model can make accurate and robust COVID-19 decision-support with little labelled data.