Background: Various deep-learning systems have been proposed for automated sleep staging. Still, the significance of age-specific underrepresentation in training data and the resulting errors in clinically used sleep metrics are unknown.
Methods: We adopted XSleepNet2, a deep neural network for automated sleep staging, to train and test models using polysomnograms of 1232 children (7.0 ± 1.4 years) and 3757 adults (56.9 ± 19.4 years) and 2788 older adults (mean 80.7 ± 4.2 years). We developed four separate sleep stage classifiers using exclusively pediatric (P), adult (A), older adults (O) as well as PSG from mixed cohorts: pediatric, adult, and older adult (PAO). Results were compared against an alternative sleep stager (DeepSleepNet) for validation purposes.
Results: When pediatric PSG was classified by XSleepNet2 exclusively trained on pediatric PSG, the overall accuracy was 88.9%, dropping to 78.9% when subjected to a system trained exclusively on adult PSG. Errors performed by the system staging PSG of older people were comparably lower. However, all systems produced significant errors in clinical markers when considering individual PSG. Results obtained with DeepSleepNet showed similar patterns.
Conclusion: Underrepresentation of age groups, in particular children, can significantly lower the performance of automatic deep-learning sleep stagers. In general, automated sleep stagers may behave unexpectedly, limiting clinical use. Future evaluation of automated systems must pay attention to PSG-level performance and overall accuracy.
Keywords: Sleep staging; machine learning; polysomnography.
Copyright © 2023 The Authors. Published by Elsevier B.V. All rights reserved.