Preliminary assessment of TNM classification performance for pancreatic cancer in Japanese radiology reports using GPT-4

Jpn J Radiol. 2024 Aug 20. doi: 10.1007/s11604-024-01643-y. Online ahead of print.

Abstract

Purpose: A large-scale language model is expected to have been trained with a large volume of data including cancer treatment protocols. The current study aimed to investigate the use of generative pretrained transformer 4 (GPT-4) for identifying the TNM classification of pancreatic cancers from existing radiology reports written in Japanese.

Materials and methods: We screened 100 consecutive radiology reports on computed tomography scan for pancreatic cancer from April 2020 to June 2022. GPT-4 was requested to classify the TNM from the radiology reports based on the General Rules for the Study of Pancreatic Cancer 7th Edition. The accuracy and kappa coefficient of the TNM classifications by GPT-4 was evaluated with the classifications by two experienced abdominal radiologists as gold standard.

Results: The accuracy values of the T, N, and M factors were 0.73, 0.91, and 0.93, respectively. The kappa coefficients were 0.45 for T, 0.79 for N, and 0.83 for M.

Conclusion: Although GPT is familiar with the TNM classification for pancreatic cancer, its performance in classifying actual cases in this experiment may not be adequate.

Keywords: GPT-4; Large-scale language model; Natural language processing; Pancreatic cancer; TNM classification.