This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.
Keywords: Activation maps; Evidence-based design; Explainable AI; Human-centered AI; Human–AI collaboration; Medical imaging.
Copyright © 2024 The Author(s). Published by Elsevier Ltd.. All rights reserved.