Economic evaluation for medical artificial intelligence: accuracy vs. cost-effectiveness in a diabetic retinopathy screening case

NPJ Digit Med. 2024 Feb 21;7(1):43. doi: 10.1038/s41746-024-01032-9.

Abstract

Artificial intelligence (AI) models have shown great accuracy in health screening. However, for real-world implementation, high accuracy may not guarantee cost-effectiveness. Improving AI's sensitivity finds more high-risk patients but may raise medical costs while increasing specificity reduces unnecessary referrals but may weaken detection capability. To evaluate the trade-off between AI model performance and the long-running cost-effectiveness, we conducted a cost-effectiveness analysis in a nationwide diabetic retinopathy (DR) screening program in China, comprising 251,535 participants with diabetes over 30 years. We tested a validated AI model in 1100 different diagnostic performances (presented as sensitivity/specificity pairs) and modeled annual screening scenarios. The status quo was defined as the scenario with the most accurate AI performance. The incremental cost-effectiveness ratio (ICER) was calculated for other scenarios against the status quo as cost-effectiveness metrics. Compared to the status quo (sensitivity/specificity: 93.3%/87.7%), six scenarios were cost-saving and seven were cost-effective. To achieve cost-saving or cost-effective, the AI model should reach a minimum sensitivity of 88.2% and specificity of 80.4%. The most cost-effective AI model exhibited higher sensitivity (96.3%) and lower specificity (80.4%) than the status quo. In settings with higher DR prevalence and willingness-to-pay levels, the AI needed higher sensitivity for optimal cost-effectiveness. Urban regions and younger patient groups also required higher sensitivity in AI-based screening. In real-world DR screening, the most accurate AI model may not be the most cost-effective. Cost-effectiveness should be independently evaluated, which is most likely to be affected by the AI's sensitivity.