Preliminary assessment of TNM classification performance for pancreatic cancer in Japanese radiology reports using GPT-4

Kazufumi Suzuki; Hiroki Yamada; Hiroshi Yamazaki; Goro Honda; Shuji Sakai

doi:10.1007/s11604-024-01643-y

Preliminary assessment of TNM classification performance for pancreatic cancer in Japanese radiology reports using GPT-4

Jpn J Radiol. 2024 Aug 20. doi: 10.1007/s11604-024-01643-y. Online ahead of print.

Authors

Kazufumi Suzuki¹, Hiroki Yamada², Hiroshi Yamazaki³, Goro Honda⁴, Shuji Sakai²

Affiliations

¹ Department of Radiology, Division of Diagnostic Imaging and Nuclear Medicine, Tokyo Women's Medical University, 8-1, Kawada-Cho, Shinjuku-Ku, Tokyo, 162-8666, Japan. suzuki.kazufumi@twmu.ac.jp.
² Department of Radiology, Division of Diagnostic Imaging and Nuclear Medicine, Tokyo Women's Medical University, 8-1, Kawada-Cho, Shinjuku-Ku, Tokyo, 162-8666, Japan.
³ Department of Radiology, Iwaki City Medical Center, Iwaki City, Japan.
⁴ Department of Surgery, Division of Hepatobiliary and Pancreatic Surgery, Tokyo Women's Medical University, Tokyo, Japan.

PMID: 39162781
DOI: 10.1007/s11604-024-01643-y

Abstract

Purpose: A large-scale language model is expected to have been trained with a large volume of data including cancer treatment protocols. The current study aimed to investigate the use of generative pretrained transformer 4 (GPT-4) for identifying the TNM classification of pancreatic cancers from existing radiology reports written in Japanese.

Materials and methods: We screened 100 consecutive radiology reports on computed tomography scan for pancreatic cancer from April 2020 to June 2022. GPT-4 was requested to classify the TNM from the radiology reports based on the General Rules for the Study of Pancreatic Cancer 7th Edition. The accuracy and kappa coefficient of the TNM classifications by GPT-4 was evaluated with the classifications by two experienced abdominal radiologists as gold standard.

Results: The accuracy values of the T, N, and M factors were 0.73, 0.91, and 0.93, respectively. The kappa coefficients were 0.45 for T, 0.79 for N, and 0.83 for M.

Conclusion: Although GPT is familiar with the TNM classification for pancreatic cancer, its performance in classifying actual cases in this experiment may not be adequate.

Keywords: GPT-4; Large-scale language model; Natural language processing; Pancreatic cancer; TNM classification.