OptimCLM: Optimizing clinical language models for predicting patient outcomes via knowledge distillation, pruning and quantization

Int J Med Inform. 2024 Dec 18:195:105764. doi: 10.1016/j.ijmedinf.2024.105764. Online ahead of print.

Abstract

Background: Clinical Language Models (CLMs) possess the potential to reform traditional healthcare systems by aiding in clinical decision making and optimal resource utilization. They can enhance patient outcomes and help healthcare management through predictive clinical tasks. However, their real-world deployment is limited due to high computational cost at inference, in terms of both time and space complexity.

Objective: This study aims to develop and optimize an efficient framework that compresses CLMs without significant performance loss, reducing inference time and disk-space, and enabling real-world clinical applications.

Methods: We introduce OptimCLM, a framework for optimizing CLMs with ensemble learning, knowledge distillation (KD), pruning and quantization. Based on domain-knowledge and performance, we select and combine domain-adaptive CLMs DischargeBERT and COReBERT as the teacher ensemble model. We transfer the teacher's knowledge to two smaller generalist models, BERT-PKD and TinyBERT, and apply black-box KD, post-training unstructured pruning and post-training 8-bit model quantization to them. In an admission-to-discharge setting, we evaluate the framework on four clinical outcome prediction tasks (length of stay prediction, mortality prediction, diagnosis prediction and procedure prediction) using admission notes from the MIMIC-III clinical database.

Results: The OptimCLM framework achieved up to 22.88× compression ratio and 28.7× inference speedup, with less than 5% and 2% loss in macro-averaged AUROC for TinyBERT and BERT-PKD, respectively. The teacher model outperformed five state-of-the-art models on all tasks. The optimized BERT-PKD model also outperformed them in most tasks.

Conclusion: Our findings suggest that domain-specific fine-tuning with ensemble learning and KD is more effective than domain-specific pre-training for domain-knowledge transfer and text classification tasks. Thus, this work demonstrates the feasibility and potential of deploying optimized CLMs in healthcare settings and developing them with less computational resources.

Keywords: Black-box distillation; Clinical outcome prediction; Ensemble learning; Model compression; Post-training quantization; Unstructured pruning.