[The spring of artificial intelligence: AI vs. expert for internal medicine cases]

Rev Med Interne. 2024 Jul;45(7):409-414. doi: 10.1016/j.revmed.2024.01.012. Epub 2024 Feb 7.
[Article in French]

Abstract

Introduction: The "Printemps de la Médecine Interne" are training days for Francophone internists. The clinical cases presented during these days are complex. This study aims to evaluate the diagnostic capabilities of non-specialized artificial intelligence (language models) ChatGPT-4 and Bard by confronting them with the puzzles of the "Printemps de la Médecine Interne".

Method: Clinical cases from the "Printemps de la Médecine Interne" 2021 and 2022 were submitted to two language models: ChatGPT-4 and Bard. In case of a wrong answer, a second attempt was offered. We then compared the responses of human internist experts to those of artificial intelligence.

Results: Of the 12 clinical cases submitted, human internist experts diagnosed nine, ChatGPT-4 diagnosed three, and Bard diagnosed one. One of the cases solved by ChatGPT-4 was not solved by the internist expert. The artificial intelligence had a response time of a few seconds.

Conclusions: Currently, the diagnostic skills of ChatGPT-4 and Bard are inferior to those of human experts in solving complex clinical cases but are very promising. Recently made available to the general public, they already have impressive capabilities, questioning the role of the diagnostic physician. It would be advisable to adapt the rules or subjects of future "Printemps de la Médecine Interne" so that they are not solved by a public language model.

Keywords: Artificial intelligence; Bard; Case report; ChatGPT; Diagnostic; Intelligence artificielle.

Publication types

  • Comparative Study
  • English Abstract

MeSH terms

  • Artificial Intelligence*
  • Clinical Competence / standards
  • France
  • Humans
  • Internal Medicine* / education
  • Internal Medicine* / methods