A Machine Learning Prediction Model for Total Shoulder Arthroplasty Procedure Duration - An Evaluation of Surgeon, Patient, And Shoulder-Specific Factors

J Shoulder Elbow Surg. 2024 Dec 21:S1058-2746(24)00947-9. doi: 10.1016/j.jse.2024.10.028. Online ahead of print.

Abstract

Background: Operating room (OR) efficiency is of paramount importance for scheduling, cost efficiency and to allow for the high operating volume required to address the growing demand for arthroplasty. The purpose of this study was to develop a machine learning predictive model for Total Shoulder Arthroplasty (TSA) procedure duration and to identify factors which are predictive of a prolonged procedure.

Methods: A retrospective review was undertaken of all TSA between 2013-2021 in a large academic institution. Patient, surgeon, anesthetic and shoulder specific factors were assessed. The duration of time in the OR was recorded and compared to the human scheduler and electronic health record (EHR) predicted procedure duration. Two gradient-boosted decision tree regression models were created with both training and validation datasets. The mean squared logarithmic error (MSLE) was chosen as the loss function. The first model (M1) considered patient, surgeon, and anesthetic factors, while the second model (M2) considered shoulder anatomy and pathology specific factors in addition.

Results: Human schedulers' predicted 64.1% of cases accurately, with 26.7% under-predicted and 9.2% overpredicted. M1 successfully predicted 79.7% of cases, with 6.9% under-predicted and 13.4% over-predicted. M2 successfully predicted 82.5% of cases with 8.8% under-predicted and 8.8% overs-predicted. M2 was significantly more accurate in predicting anatomic TSA (aTSA) compared to reverse (rTSA) (90.6% vs 78.1%, p <0.001).The feature with the greatest impact on the shoulder specific model's prediction was the historical median procedure duration; followed by the EHR prediction, surgeon prediction, patient age and a traumatic indication. Factors which were associated with underpredicting procedure duration included younger age, traumatic indication, male sex, greater BMI and a B2 glenoid.

Conclusion: Machine learning predictive models outperformed traditional scheduling, with a model incorporating general and shoulder specific data providing the most accurate prediction of TSA procedure duration. Integration of modelling has the potential to optimize theatre utilization and improve efficiency.

Keywords: Shoulder arthroplasty; artificial intelligence; machine learning; reverse shoulder arthroplasty.