Back to Search Start Over

Feature Extraction Methods for Binary Code Similarity Detection Using Neural Machine Translation Models

Authors :
Norimitsu Ito
Masaki Hashimoto
Akira Otsuka
Source :
IEEE Access, Vol 11, Pp 102796-102805 (2023)
Publication Year :
2023
Publisher :
IEEE, 2023.

Abstract

Binary code similarity detection is an effective analysis technique for vulnerability, bug, and plagiarism detection in software for which the source code cannot be obtained. The recent proliferation of IoT devices has also increased the demand for similarity detection across different architectures. However, there are currently not many examples of feature extraction methods using neural machine translation (NMT) models being applied to similarity detection in basic block units across different architectures. In this research, we propose new methods that extract features at a higher speed and detect similarities across different architectures with higher accuracy than existing methods for basic block feature extraction using neural machine translation models. We assume that the intermediate representation of the NMT model, which learned the translation of basic blocks across different architectures, includes the semantics of the instructions in the basic block. Hence we adopted the intermediate representation as the features of the basic blocks. Then, we applied the linear transformation used in bilingual word embedding to match the embedding space of basic blocks across different architectures. This enables the similarity detection in basic block units across different architectures with higher accuracy than the distance learning method used in existing research to match the embedding space. In the evaluation experiment, we compare the Precision at k (P@k) on the same dataset with existing research methods and our method achieved the highest accuracy of 92%. In addition, We also compare the time required for feature extraction using GPUs, and found that it was up to 16 times faster.

Details

Language :
English
ISSN :
21693536
Volume :
11
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.81cc979a59f84dc0ad7b90c4f61fef7f
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2023.3316215