Generative Artificial Intelligence (AI) systems such as OpenAI's ChatGPT, capable of an unprecedented ability to generate human-like text and converse in real time, hold potential for large-scale deployment in clinical settings such as substance use treatment. Treatment for substance use disorders (SUDs) is particularly high stakes, requiring evidence-based clinical treatment, mental health expertise, and peer support. Thus, promises of AI systems addressing deficient healthcare resources and structural bias are relevant within this domain, especially in an anonymous setting. This study explores the effectiveness of generative AI in answering real-world substance use and recovery questions. We collect questions from online recovery forums, use ChatGPT and Meta's LLaMA-2 for responses, and have SUD clinicians rate these AI responses. While clinicians rated the AI-generated responses as high quality, we discovered instances of dangerous disinformation, including disregard for suicidal ideation, incorrect emergency helplines, and endorsement of home detox. Moreover, the AI systems produced inconsistent advice depending on question phrasing. These findings indicate a risky mix of seemingly high-quality, accurate responses upon initial inspection that contain inaccurate and potentially deadly medical advice. Consequently, while generative AI shows promise, its real-world application in sensitive healthcare domains necessitates further safeguards and clinical validation.
Keywords: Alcohol; Generative AI; Large language models; Marijuana; Opioids; Substance use.
Copyright © 2024. Published by Elsevier B.V.