Back to Search Start Over

Binary level toolchain provenance identification with graph neural networks

Authors :
Tristan Benoit
Jean-Yves Marion
Sébastien Bardin
Carbone (CARBONE)
Department of Formal Methods (LORIA - FM)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
CEA- Saclay (CEA)
Commissariat à l'énergie atomique et aux énergies alternatives (CEA)
This work is supported by (i) a public grant overseen by the French National Research Agency (ANR) as part of the 'Investissements d'Avenir' French PIA project 'Lorraine Université d'Excellence', reference ANR-15-IDEX-04-LUE, and (ii) has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 830927 (Concordia). Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).
GRID5000
IMPACT-DIGITRUST
ANR-15-IDEX-0004,LUE,Isite LUE(2015)
European Project: 830927,CONCORDIA(2019)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)
Source :
SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Mar 2021, Honolulu / Virtual, United States. pp.131-141, ⟨10.1109/SANER50967.2021.00021⟩, SANER
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

International audience; We consider the problem of recovering the compiling chain used to generate a given stripped binary code. We present a Graph Neural Network framework at the binary level to solve this problem, with the idea to take into account the shallow semantics provided by the binary code's structured control flow graph (CFG).We introduce a Graph Neural Network, called Site Neural Network (SNN), dedicated to this problem. To attain scalability at the binary level, feature extraction is simplified by forgetting almost everything in a CFG except transfer control instructions and performing a parametric graph reduction. Our experiments show that our method recovers the compiler family with a very high F1-Score of 0.9950 while the optimization level is recovered with a moderately high F1-Score of 0.7517. On the compiler version prediction task, the F1-Score is about 0.8167 excluding the clang family. A comparison with a previous work demonstrates the accuracy and performance of this framework.

Details

Language :
English
Database :
OpenAIRE
Journal :
SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Mar 2021, Honolulu / Virtual, United States. pp.131-141, ⟨10.1109/SANER50967.2021.00021⟩, SANER
Accession number :
edsair.doi.dedup.....adc23a0507b81b948cfd6c017c37634a