The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions

Hayden L Hofmann; Gage A Guerra; Jonathan L Le; Alexander M Wong; Grady H Hofmann; Cory K Mayfield; Frank A Petrigliano; Joseph N Liu

doi:10.3928/01477447-20230922-05

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Authors

Hayden L Hofmann, Gage A Guerra, Jonathan L Le, Alexander M Wong, Grady H Hofmann, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu

PMID: 37757748
DOI: 10.3928/01477447-20230922-05

Abstract

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (P<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [Orthopedics. 2024;47(2):e85-e89.].

MeSH terms

Artificial Intelligence
Clinical Competence
Educational Measurement
Humans
Internship and Residency*
Orthopedic Procedures*
Orthopedics* / education