Back to Search Start Over

Cost-sensitive learning for semi-supervised hit-and-run analysis.

Authors :
Zhu, Siying
Wan, Jianwu
Source :
Accident Analysis & Prevention. Aug2021, Vol. 158, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

• Formulate the logistic regression model with minimum classification error function. • Adopt classification maximum likelihood criterion to infer label information. • Compare results of the proposed model with others and show the effectiveness. • Investigate the most significant contributing factors to hit-and-run crashes. • Conduct sensitivity analysis of related parameters. • Provide implications for policies and counter-measures for hit-and-run crashes. Hit-and-run crashes not only degrade the morality, but also result in delays of medical services provided to victims. However, class imbalance problem exists as the number of hit-and-run crashes is much smaller than that of non-hit-and-run crashes. The missing label problem also exists in the crash analysis due to reasons like data barrier such that the information hidden in the unlabelled samples has not been effectively utilised. In this paper, a cost-sensitive semi-supervised logistic regression (CS3LR) model is proposed for hit-and-run analysis, in order to tackle class-imbalanced data distribution and missing label problem, based on the crash dataset of Victorian, Australia (2013–2019). By performing label estimation with logistic regression jointly utilising both labelled and unlabelled data with pseudo labels in a well-designed cost-sensitive semi-supervised maximum likelihood framework, the proposed model can obtain an unbiased likelihood parameter for hit-and-run prediction and analysis. Comparing the experimental results of CS3LR model with two logistic regression models and seven machine learning methods, better performance of CS3LR model is demonstrated. The most significant contributing factors to hit-and-run crashes extracted by CS3LR with only 10% labelled data show a high degree of consistency with the true contributing factors obtained by the supervised cost-sensitive logistic regression with complete hit-and-run labels. The effects of class-weighted ratio and hyper-parameter λ on the performance of hit-and-run crash prediction model have also been analysed. The results can further provide recommendations and implications on the policies and counter-measures for preventing hit-and-run collisions and crimes. The methodology proposed in this paper can also be employed to analyse crash data with other types of missing labels, such as crash severity. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00014575
Volume :
158
Database :
Academic Search Index
Journal :
Accident Analysis & Prevention
Publication Type :
Academic Journal
Accession number :
150771140
Full Text :
https://doi.org/10.1016/j.aap.2021.106199