Background: Palliative spine radiation therapy is prone to treatment at the wrong anatomic level. We developed a fully automated deep learning-based spine-targeting quality assurance system (DL-SpiQA) for detecting treatment at the wrong anatomic level. DL-SpiQA was evaluated based on retrospective testing of spine radiation therapy treatments and prospective clinical deployment.
Methods: The DL-SpiQA workflow involves auto-segmentation and labelling of all vertebral volumes on CT imaging using TotalSegmentator, an open-source deep learning algorithm based on nnU-Net, calculation of the radiation dose to each vertebra, and flagging and categorisation of potential treatments at the wrong anatomic level with automated email reports sent to involved radiation therapy personnel. We developed the DL-SpiQA tool based on retrospective clinical data from patients treated with palliative spine radiation therapy from sites included in the multicentre hospital network between Feb 12, 2014, and Nov 15, 2022. We used historic cases of patients who had a near-miss (ie, wrong-anatomic-level errors caught before the patient was treated) or had received wrong-anatomic-level treatment to test whether the tool could identify known errors successfully. We then used the tool prospectively over 15 months (April 24, 2023, to July 22, 2024) to evaluate any new spine radiation therapy treatment plan created for a patient, looking for any targeting errors, and dose and volume discrepancies. An email report was circulated with all the radiation therapy personnel; if any errors were found, these were highlighted and each error was defined. The tool was internally validated. All cases flagged by DL-SpiQA for both the retrospective and prospective studies were manually reviewed for dosimetric targeting, variant spine anatomy or spinal anomalies, and artificial intelligence (AI) segmentation errors. DL-SpiQA was further validated based on false positive and negative rates estimated from the retrospective results.
Findings: DL-SpiQA was first tested retrospectively on 513 patients with segmentation of 10 106 vertebrae. The system raised flags for ten dose discrepancies, 49 normal anatomic variants, 49 cases with implants or other anomalies, and 20 segmentation errors (4% false positive rate). DL-SpiQA caught one historic treatment at the wrong anatomic level and three near-misses. DL-SpiQA was then prospectively deployed, reviewing 520 cases and identifying six documentation errors, which triggered detailed review by clinicians, and 43 additional cases, which confirmed clinical knowledge of variant anatomy. In all detected cases (ie, 49 of 520 cases in total), the appropriate personnel were alerted. A false negative rate of 0·03% is estimated based on the 4% AI segmentation error rate and the frequency of reported spine radiation therapy errors.
Interpretation: The low false positive rate, the low false negative rate, and the high accuracy in flagging errors show that DL-SpiQA is an effective, AI-driven, automated quality assurance tool that could be used to identify anatomic spine variants and errors in targeting at the anatomic level. The tool could therefore help improve the safety of spine radiotherapy. Further external validation and tailoring is needed.
Funding: None.
Copyright © 2025 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC 4.0 license. Published by Elsevier Ltd.. All rights reserved.