Investigating the contributors to hit-and-run crashes using gradient boosting decision trees

PLoS One. 2025 Jan 3;20(1):e0314939. doi: 10.1371/journal.pone.0314939. eCollection 2025.

Abstract

A classification prediction model is established based on a nonlinear method-Gradient Boosting Decision Tree (GBDT) to investigate the factors contributing to a perpetrator's escape behavior in hit-and-run crashes. Given the U.S. Crash Report Sampling System (CRSS) dataset, the model is trained and compared with the state-of-art methods (Classification and Regression Tree, Random Forest, and Logistic Regression). The results show that the GBDT outperforms other methods, achieving the lowest negative log-likelihood (0.282), misclassification rate (0.096), and the highest AUC (0.803). GBDT also demonstrates superior computational efficiency, with a LIFT value of 4.087, making it a more accurate and efficient model for predicting hit-and-run crashes compared to CART, Random Forest, and Logistic Regression. The results obtained from the GBDT show that the relative importance of crash type and relation to trafficway rank 4th and 5th, respectively. Neither is mentioned in previous studies, indicating that GBDT has the ability to mine hidden information. In addition, the interaction between influencing variables can also be obtained to investigate the joint effect of various variables. The results of this study have practical applications in hit-and-run incident prevention, accident safety analysis, and other engineering applications.

MeSH terms

  • Accidents, Traffic* / prevention & control
  • Accidents, Traffic* / statistics & numerical data
  • Algorithms
  • Decision Trees*
  • Humans
  • Logistic Models

Grants and funding

This research was sponsored by Basic Research Program of Science and Technology Commission Foundation of Jiangsu Province under Grant BK20240678 and Philosophy and Social Science Project of Colleges and Universities in Jiangsu Province under Grant No. 2024SJYB0142. Both grants were received by Gen Li.