The field of competitive swimming lacks broadly applicable predictive models for talent identification across various age groups of adolescent swimmers. This study aimed to construct a predictive model for athletic talent using machine learning methods based on anthropometric and physiological data. Baseline data were collected from 5444 participants aged 10-18 in Shanghai, China, between 2015 and 2018, with 4969 completing a 3-year follow-up. Talents were discerned based on their performance over the follow-up period, revealing age- and sex- dependent developmental differences between swimmers classified as talented versus non-talented. After controlling for confounding variables, age and sex, nine machine learning algorithms were employed, with Random Forest achieving the highest performance and being selected as the final model. The model demonstrated excellent predictive performance on both the test dataset and an independent validation dataset from Shandong (n = 118), indicating its strong generalizability. Furthermore, using the SHapley Additive exPlanations (SHAP) method to interpret the model, abdominal skinfold, lung capacity, chest circumference, shoulder width, and triceps skinfold were identified as the five most critical indicators for talent identification.
Supplementary information: The online version contains supplementary material available at 10.1007/s43657-024-00176-8.
Keywords: Anthropometry and physiology; Machine learning algorithms; Prospective adolescent swimming study; Talent identification.
© International Human Phenome Institutes (Shanghai) 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.