Back to Search Start Over

ReMAHA–CatBoost: Addressing Imbalanced Data in Traffic Accident Prediction Tasks.

Authors :
Li, Guolian
Wu, Yadong
Bai, Yulong
Zhang, Weihan
Source :
Applied Sciences (2076-3417); Dec2023, Vol. 13 Issue 24, p13123, 22p
Publication Year :
2023

Abstract

Featured Application: ReMAHA–CatBoost is an advanced machine learning model designed for predicting traffic accident severity. It is constructed in two parts: ReMAHA (relief–F-based genetic algorithm with over-sampling algorithm for weighted Mahalanobis distance) and CatBoost, to offer an innovative solution in the field of imbalanced data classification. Key Features and Highlights: (1) ReMAHA Over-sampling: ReMAHA employs the Relief–F algorithm for feature selection and combines it with an innovative over-sampling technique to enhance prediction accuracy for minority classes; (2) Feature Engineering: The model leverages feature engineering to determine the significance of different attributes, enabling it to make precise predictions regarding accident severity; and (3) CatBoost Integration: ReMAHA incorporates CatBoost, a state-of-the-art gradient-boosting algorithm, to improve predictive performance by mitigating issues like overfitting and prediction bias. This paper elucidates the working principles of oversampling algorithms in machine learning tasks based on imbalanced datasets, specifically addressing how to resolve the issue of low accuracy stemming from imbalanced data at the data level. Based on the experimental results presented in this paper, it is evident that ReMAHA–CatBoost outperforms several other oversampling algorithms and models, especially on the US–Accidents traffic accident dataset characterized by an extreme class imbalance ratio of 91.40. This improved performance enhances the precision of traffic accident severity prediction. Using historical information from traffic accidents to predict accidents has always been an area of active exploration by researchers in the field of transportation. However, predicting only the occurrence of traffic accidents is insufficient for providing comprehensive information to relevant authorities. Therefore, further classification of predicted traffic accidents is necessary to better identify and prevent potential hazards and the escalation of accidents. Due to the significant disparity in the occurrence rates of different severity levels of traffic accidents, data imbalance becomes a critical issue. To address the challenge of predicting extremely imbalanced traffic accident events, this paper introduces a predictive framework named ReMAHA–CatBoost. To evaluate the effectiveness of ReMAHA–CatBoost, we conducted experiments on the US–Accidents traffic accident dataset, where the class label imbalance reaches up to 91.40 times. The experimental results demonstrate that the proposed model in this paper exhibits exceptional predictive performance in the domain of imbalanced traffic accident prediction. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
13
Issue :
24
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
174404228
Full Text :
https://doi.org/10.3390/app132413123