Back to Search Start Over

Approach to Predict Software Vulnerability Based on Multiple-Level N-gram Feature Extraction and Heterogeneous Ensemble Learning.

Authors :
Zhang, Bing
Gao, Yuan
Wu, Jingyi
Wang, Ning
Wang, Qian
Ren, Jiadong
Source :
International Journal of Software Engineering & Knowledge Engineering; Oct2022, Vol. 32 Issue 10, p1559-1582, 24p
Publication Year :
2022

Abstract

Software vulnerabilities are one of the roots of computer security problems. The traditional static analysis and dynamic analysis methods based on software source code mainly have some deficiencies, such as high false positive rate, high false negative rate and insufficient semantic information captured. Nevertheless, the application of machine learning, Natural Language Processing and other technologies in software vulnerability prediction can effectively mitigate such issues. This paper proposed a vulnerability prediction method based on multiple-level N-gram feature extraction and heterogeneous ensemble learning. First, by code intermediate representation and constructing a multiple-level N-gram feature generation model, two kinds of N-gram semantic features with different window size and different granularity at word and char level were extracted to retain the semantic and structural information of code. Second, TF–IDF was used to construct the vector space model as the input of prediction model. As a single classifier was prone to overfitting and poor generalization, this paper conducted benchmark testing on five classical machine learning algorithms (NB, SVM, DT, LR, RF), and then combined four (SVM, DT, LR, RF) among them, which had better performance as the base classifiers to form the stacking heterogeneous ensemble method to build the vulnerability prediction model. Finally, the proposed method was verified on buffer overflow vulnerability and resource management vulnerability datasets, with a lowest false positive rate and false negative rate which can reach 1.58% and 4.06%, respectively. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02181940
Volume :
32
Issue :
10
Database :
Complementary Index
Journal :
International Journal of Software Engineering & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
160374865
Full Text :
https://doi.org/10.1142/S0218194022500620