Objectives: This study aimed to evaluate the performance of artificial intelligence (AI) software in bone age (BA) assessment, according to the Greulich and Pyle (G&P) method in a German pediatric cohort.
Materials and methods: Hand radiographs of 306 pediatric patients aged 1-18 years (153 boys, 153 girls, 18 patients per year of life)-including a subgroup of patients in the age group for which the software is declared (243 patients)-were analyzed retrospectively. Two pediatric radiologists and one endocrinologist made independent blinded BA reads. Subsequently, AI software estimated BA from the same images. Both agreements, accuracy, and interchangeability between AI and expert readers were assessed.
Results: The mean difference between the average of three expert readers and AI software was 0.39 months with a mean absolute difference (MAD) of 6.8 months (1.73 months for the mean difference and 6.0 months for MAD in the intended use subgroup). Performance in boys was slightly worse than in girls (MAD 6.3 months vs. 5.6 months). Regression analyses showed constant bias (slope of 1.01 with a 95% CI 0.99-1.02). The estimated equivalence index for interchangeability was - 14.3 (95% CI -27.6 to - 1.1).
Conclusion: In terms of BA assessment, the new AI software was interchangeable with expert readers using the G&P method.
Clinical relevance statement: The use of AI software enables every physician to provide expert reader quality in bone age assessment.
Key points: • A novel artificial intelligence-based software for bone age estimation has not yet been clinically validated. • Artificial intelligence showed a good agreement and high accuracy with expert radiologists performing bone age assessment. • Artificial intelligence showed to be interchangeable with expert readers.
Keywords: Artificial intelligence; Bone age measurements; Growth; Hand; X-rays.
© 2023. The Author(s).