Artificial Intelligence can Facilitate Application of Risk Stratification Algorithms to Bladder Cancer Patient Case Scenarios

Clin Med Insights Oncol. 2024 Nov 17:18:11795549241296781. doi: 10.1177/11795549241296781. eCollection 2024.

Abstract

Background: Chat Generative Pre-Trained Transformer (ChatGPT) has previously been shown to accurately predict colon cancer screening intervals when provided with clinical data and context in the form of guidelines. The National Comprehensive Cancer Network® (NCCN®) guideline on non-muscle invasive bladder cancer (NMIBC) includes criteria for risk stratification into low-, intermediate-, and high-risk groups based on patient and disease characteristics. The aim of this study is to evaluate the ability of ChatGPT to apply the NCCN Guidelines to risk stratify theoretical patient scenarios related to NMIBC.

Methods: Thirty-six hypothetical patient scenarios related to NMIBC were created and submitted to GPT-3.5 and GPT-4 at two separate time points. First, both models were prompted to risk stratify patients without any additional context provided. Custom instructions were then provided as textual context using the written versions of the NMIBC NCCN® Guidelines, followed by repeat risk stratification. Finally, GPT-4 was provided with an image of the NMIBC risk groups table, and the risk stratification was again performed.

Results: GPT-3.5 correctly risk stratified 68% (24.5 of 36) of scenarios without context, slightly increasing to 74% (26.5 of 36) with textual context. Using GPT-4, the model had accuracy of 83% (30 of 36) without context, reaching 100% (36 of 36) with textual context (P = .025). GPT-4 with image context maintained similar accuracy to GPT-4 without context, with accuracy 81% (29 of 36). ChatGPT generally performed poorly when stratifying intermediate risk NMIBC (33%-63%). When risk stratification was incorrect, most responses were overestimations of risk.

Conclusions: GPT-4 can accurately risk stratify patients with respect to NMIBC when provided with context containing guidelines. Overestimation of risk is more common than underestimation, and intermediate risk NMIBC is most likely to be incorrectly stratified. With further validation, GPT-4 can become a tool for risk stratification of NMIBC in clinical practice.

Keywords: Artificial intelligence; guideline adherence; non-muscle invasive bladder neoplasms; risk factors; urology.