Back to Search Start Over

FUSION: Measuring Binary Function Similarity with Code-Specific Embedding and Order-Sensitive GNN

Authors :
Hao Gao
Tong Zhang
Songqiang Chen
Lina Wang
Fajiang Yu
Source :
Symmetry, Vol 14, Iss 12, p 2549 (2022)
Publication Year :
2022
Publisher :
MDPI AG, 2022.

Abstract

Binary code similarity measurement is a popular research area in binary analysis with the recent development of deep learning-based models. Current state-of-the-art methods often use the pre-trained language model (PTLM) to embed instructions into basic blocks as representations of nodes within a control flow graph (CFG). These methods will then use the graph neural network (GNN) to embed the whole CFG and measure the binary similarities between these code embeddings. However, these methods almost directly treat the assembly code as a natural language text and ignore its code-specific features when training PTLM. Moreover, They barely consider the direction of edges in the CFG or consider it less efficient. The weaknesses of the above approaches may limit the performances of previous methods. In this paper, we propose a novel method called function similarity using code-specific PPTs and order-sensitive GNN (FUSION). Since the similarity of binary codes is a symmetric/asymmetric problem, we were guided by the ideas of symmetry and asymmetry in our research. They measure the binary function similarity with two code-specific PTLM training strategies and an order-sensitive GNN, which, respectively, alleviate the aforementioned weaknesses. FUSION outperforms the state-of-the-art binary similarity methods by up to 5.4% in accuracy, and performs significantly better.

Details

Language :
English
ISSN :
20738994
Volume :
14
Issue :
12
Database :
Directory of Open Access Journals
Journal :
Symmetry
Publication Type :
Academic Journal
Accession number :
edsdoj.1b170ed6126437b9607337a03b21396
Document Type :
article
Full Text :
https://doi.org/10.3390/sym14122549