Background: The diagnosis of failure to progress, the most common indication for intrapartum cesarean delivery, is based on the assessment of cervical dilation and station over time. Labor curves serve as references for expected changes in dilation and fetal descent. The labor curves of Friedman, Zhang et al, and others are based on time alone and derived from mothers with spontaneous labor onset. However, labor induction is now common, and clinicians also consider other factors when assessing labor progress. Labor curves that consider the use of labor induction and other factors that influence labor progress have the potential to be more accurate and closer to clinical decision-making.
Objective: This study aimed to compare the prediction errors of labor curves based on a single factor (time) or multiple clinically relevant factors using two modeling methods: mixed-effects regression, a standard statistical method, and Gaussian processes, a machine learning method.
Study design: This was a longitudinal cohort study of changes in dilation and station based on data from 8022 births in nulliparous women with a live, singleton, vertex-presenting fetus ≥35 weeks of gestation with a vaginal delivery. New labor curves of dilation and station were generated with 10-fold cross-validation. External validation was performed using a geographically independent group. Model variables included time from the first examination in the 20 hours before delivery; dilation, effacement, and station recorded at the previous examination; cumulative contraction counts; and use of epidural anesthesia and labor induction. To assess model accuracy, differences between each model's predicted value and its corresponding observed value were calculated. These prediction errors were summarized using mean absolute error and root mean squared error statistics.
Results: Dilation curves based on multiple parameters were more accurate than those derived from time alone. The mean absolute error of the multifactor methods was better (lower) than those of the single-factor methods (0.826 cm [95% confidence interval, 0.820-0.832] for the multifactor machine learning and 0.893 cm [95% confidence interval, 0.885-0.901] for the multifactor mixed-effects method and 2.122 cm [95% confidence interval, 2.108-2.136] for the single-factor methods; P<.0001 for both comparisons). The root mean squared errors of the multifactor methods were also better (lower) than those of the single-factor methods (1.126 cm [95% confidence interval, 1.118-1.133] for the machine learning [P<.0001] and 1.172 cm [95% confidence interval, 1.164-1.181] for the mixed-effects methods and 2.504 cm [95% confidence interval, 2.487-2.521] for the single-factor [P<.0001 for both comparisons]). The multifactor machine learning dilation models showed small but statistically significant improvements in accuracy compared to the mixed-effects regression models (P<.0001). The multifactor machine learning method produced a curve of descent with a mean absolute error of 0.512 cm (95% confidence interval, 0.509-0.515) and a root mean squared error of 0.660 cm (95% confidence interval, 0.655-0.666). External validation using independent data produced similar findings.
Conclusion: Cervical dilation models based on multiple clinically relevant parameters showed improved (lower) prediction errors compared to models based on time alone. The mean prediction errors were reduced by more than 50%. A more accurate assessment of departure from expected dilation and station may help clinicians optimize intrapartum management.
Keywords: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD); artificial intelligence; cervical dilation; dystocia; epidural anesthesia; failure to progress in labor; fetal descent; labor disorders; labor progression; machine learning; mixed-effects; multifactor; multivariable; partogram; prediction error; rupture of membranes; station.
Published by Elsevier Inc.