Back to Search Start Over

Function-Level Code Obfuscation Detection Through Self-Attention-Guided Multi-Representation Fusion.

Authors :
Tian, Zhenzhou
He, Ruikang
Zhao, Hongliang
Chen, Lingwei
Source :
International Journal of Software Engineering & Knowledge Engineering; Apr2024, Vol. 34 Issue 4, p651-673, 23p
Publication Year :
2024

Abstract

Malware developers often employ code obfuscation techniques to conceal their malicious functionality, making it challenging to detect and analyze such software. While various de-obfuscation techniques exist, the majority of them require prior knowledge of the obfuscation tools and techniques in use. Identifying the specific obfuscation tools or algorithms applied to the obfuscated code is thus of vital importance, which, however, typically demands in-depth expert knowledge and substantial efforts. Therefore, this paper presents DeObA, a deep learning (DL) driven approach for the precise and efficient detection of obfuscation algorithms on the fine-grained function-level code snippets. To comprehensively capture unique patterns or features of different obfuscation algorithms from code, DeObA works on multiple distinct code views, encompassing token sequences, abstract syntax trees (AST) and program dependency graphs (PDG), which will reflect the code's lexical morphology, syntactic and structural aspects. After individually collecting obfuscation-indicative features with well-matched DL encoder from each code view, a self-attention-based fusion strategy is performed on these features to produce an integrated, dense, yet feature-rich vector. This vector is then fed into a softmax classification layer for prediction. Due to the lack of a moderately sized dataset, a large obfuscation corpus is curated with 7 different obfuscation tools and a total of 12 obfuscation algorithms on 39,070 C/C + + functions. The experimental evaluations conducted on the dataset exhibit a distinguished detection performance of DeObA, which achieve accuracy rates of 99.90% and 99.19% on the obfuscation tool detection and obfuscation algorithm detection tasks, respectively. The ablation study also confirms the active role of considering multiple distinct code views and the effectiveness of the designed self-attention-based fusion strategy. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02181940
Volume :
34
Issue :
4
Database :
Complementary Index
Journal :
International Journal of Software Engineering & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
177469304
Full Text :
https://doi.org/10.1142/S0218194023500663