Purpose: To present a Machine Learning pipeline for automatically relabeling anatomical structure sets in the Digital Imaging and Communications in Medicine (DICOM) format to a standard nomenclature that will enable data abstraction for research and quality improvement.
Methods: DICOM structure sets from approximately 1200 lung and prostate cancer patients across 40 treatment centers were used to build predictive models to automate the relabeling of clinically specified structure labels to standardized labels as defined by the American Association of Physics in Medicine's (AAPM) Task Group 263 (TG-263). Volumetric bitmaps were created based on the delineated volumes and were combined with associated bony anatomy data to build feature vectors. Feature reduction was performed with singular value decomposition and the resulting vectors were used for predicting the label of each structure using five different classifier algorithms on the Apache Spark platform with 5-fold cross-validation. Undersampling methods were used to deal with underlying class imbalance that hindered the performance of classifiers. Experiments were performed on both a curated version of the data, which included only annotated structures, and the non-curated data that included all structures from the original treatment plans.
Results: Random Forest provided the highest accuracies with F1 scores of 98.77 for lung and 95.06 for prostate on the curated data sets. Scores were lower with 95.67 for lung and 90.22 for prostate on the non-curated data sets, highlighting some of the challenges of classifying real clinical data. Including bony anatomy data and pooling information from all structures for the same patient both increased accuracies. In some cases, undersampling with k-Means clustering for class balancing improved classifier accuracy but in all experiments it significantly reduced run time compared to random undersampling.
Conclusion: This work shows that structure sets can be relabeled using our approach with accuracies over 95% for many structure types when presented with curated data. Although accuracies dropped when using the full non-curated data sets, some structure types were still correctly labeled over 90% of the time. With similar results obtained on an external test data set, we can infer that the proposed models are likely to work on other clinical data sets.
Keywords: Class imbalance; DICOM; Machine Learning; Radiation Oncology; Random Forest; TG-263.
Copyright © 2020 Elsevier Inc. All rights reserved.