Improved prediction of swimming talent through random forest analysis of anthropometric and physiological phenotypes

Phenomics. 2024 Nov 20;4(5):465-472. doi: 10.1007/s43657-024-00176-8. eCollection 2024 Oct.

Abstract

The field of competitive swimming lacks broadly applicable predictive models for talent identification across various age groups of adolescent swimmers. This study aimed to construct a predictive model for athletic talent using machine learning methods based on anthropometric and physiological data. Baseline data were collected from 5444 participants aged 10-18 in Shanghai, China, between 2015 and 2018, with 4969 completing a 3-year follow-up. Talents were discerned based on their performance over the follow-up period, revealing age- and sex- dependent developmental differences between swimmers classified as talented versus non-talented. After controlling for confounding variables, age and sex, nine machine learning algorithms were employed, with Random Forest achieving the highest performance and being selected as the final model. The model demonstrated excellent predictive performance on both the test dataset and an independent validation dataset from Shandong (n = 118), indicating its strong generalizability. Furthermore, using the SHapley Additive exPlanations (SHAP) method to interpret the model, abdominal skinfold, lung capacity, chest circumference, shoulder width, and triceps skinfold were identified as the five most critical indicators for talent identification.

Supplementary information: The online version contains supplementary material available at 10.1007/s43657-024-00176-8.

Keywords: Anthropometry and physiology; Machine learning algorithms; Prospective adolescent swimming study; Talent identification.