Real-time crash prediction is a heavily studied area given their potential applications in proactive traffic safety management in which a plethora of statistical and machine learning (ML) models have been developed to predict traffic crashes in real-time. However, one of the fundamental issues relating to the application of these models is spatio-temporal transferability. The present paper attempts to address this gap of knowledge by combining Generative Adversarial Network (GAN) and transfer learning to examine the transferability of real-time crash prediction models under an extremely imbalanced data setting. Initially, a baseline model was developed using Deep Neural Network (DNN) with crash and microscopic traffic data collected from M1 Motorway in the UK in 2017. The dataset utilised in the baseline model is naturally imbalanced with 257 crash cases and 16,359,163 non-crash cases. To overcome data imbalance issue, Wasserstein GAN (WGAN) was utilised to generate synthetic crash data. Non-crash data were randomly undersampled due to computational limitations. The calibrated model was then applied to predict traffic crashes for five other datasets obtained from M1 (2018), M4 (2017 & 2018 separately) and M6 Motorway (2017 & 2018 separately) by using transfer learning. Model transferability was compared with standalone models and direct transfer from the baseline model. The study revealed that direct transfer is not feasible. However, models become transferable temporally, spatially, and spatio-temporally if transfer learning is applied. The predictability of the transferred models outperformed existing studies by achieving high Area Under Curve (AUC) values ranging between 0.69 and 0.95. The best transferred model can predict nearly 95% crashes with only a 5% false alarm rate by tuning thresholds. Furthermore, the performances of transferred models are on par with or better than the standalone model. The findings of this study proves that transfer learning can improve model transferability under extremely imbalanced settings which helps traffic engineers in developing highly transferable models in future.
Keywords: Generative adversarial network; Imbalanced dataset; Oversampling; Transfer learning; Transferability.
Copyright © 2021 Elsevier Ltd. All rights reserved.