Back to Search Start Over

Decoupling coding habits from functionality for effective binary authorship attribution.

Authors :
Alrabaee, Saed
Shirani, Paria
Wang, Lingyu
Debbabi, Mourad
Hanna, Aiman
Source :
Journal of Computer Security; 2019, Vol. 27 Issue 6, p613-648, 36p
Publication Year :
2019

Abstract

Binary authorship attribution refers to the process of identifying the author of a given anonymous binary file based on stylistic characteristics. It aims to automate the laborious and error-prone reverse engineering task of discovering information related to the author(s) of a binary code. Existing works typically employ machine learning methods to extract features that are unique for each author and subsequently match them against a given binary to identify the author. However, most existing works share a common critical limitation, i.e., they cannot distinguish between features representing program functionality and those representing authorship (e.g., authors' coding habits). Such distinction is crucial for effective authorship attribution because what is unique in a particular binary may be attributed to either author, compiler, or function. In this study, we present BinAuthor a system capable of decoupling program functionality from authors' coding habits in binary code. To capture coding habits, BinAuthor leverages a set of features that are based on collections of functionality-independent choices made by authors during coding. Our evaluation demonstrates that BinAuthor outperforms existing methods in several aspects. First, it successfully attributes a larger number of authors with a significantly higher accuracy (around 90 %) based on the large datasets extracted from selected open-source C + + projects in GitHub, Google Code Jam events, Planet Source Code contests, and several programming projects. Second, BinAuthor is more robust than previous methods; there is no significant drop in accuracy when the code is subjected to refactoring techniques, simple obfuscation, and processed with different compilers. Finally, decoupling authorship from functionality allows us to apply BinAuthor to real malware binaries (Citadel, Zeus, Stuxnet, Flame, Bunny, and Babar) to automatically generate evidence on similar coding habits. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0926227X
Volume :
27
Issue :
6
Database :
Complementary Index
Journal :
Journal of Computer Security
Publication Type :
Academic Journal
Accession number :
139099710
Full Text :
https://doi.org/10.3233/JCS-191292