Back to Search
Start Over
Binary level toolchain provenance identification with graph neural networks
- Source :
- SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Mar 2021, Honolulu / Virtual, United States. pp.131-141, ⟨10.1109/SANER50967.2021.00021⟩, SANER
- Publication Year :
- 2021
- Publisher :
- HAL CCSD, 2021.
-
Abstract
- International audience; We consider the problem of recovering the compiling chain used to generate a given stripped binary code. We present a Graph Neural Network framework at the binary level to solve this problem, with the idea to take into account the shallow semantics provided by the binary code's structured control flow graph (CFG).We introduce a Graph Neural Network, called Site Neural Network (SNN), dedicated to this problem. To attain scalability at the binary level, feature extraction is simplified by forgetting almost everything in a CFG except transfer control instructions and performing a parametric graph reduction. Our experiments show that our method recovers the compiler family with a very high F1-Score of 0.9950 while the optimization level is recovered with a moderately high F1-Score of 0.7517. On the compiler version prediction task, the F1-Score is about 0.8167 excluding the clang family. A comparison with a previous work demonstrates the accuracy and performance of this framework.
- Subjects :
- Theoretical computer science
Artificial neural network
Computer science
Binary number
graph neural networks
020207 software engineering
02 engineering and technology
computer.software_genre
toolchain provenance
Toolchain
[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Scalability
0202 electrical engineering, electronic engineering, information engineering
Graph reduction
ACM: D.: Software/D.2: SOFTWARE ENGINEERING/D.2.7: Distribution, Maintenance, and Enhancement/D.2.7.5: Restructuring, reverse engineering, and reengineering
ACM: D.: Software/D.2: SOFTWARE ENGINEERING/D.2.5: Testing and Debugging/D.2.5.2: Diagnostics
Control flow graph
binary code analysis
020201 artificial intelligence & image processing
Binary code
Compiler
computer
ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.6: Learning
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021-28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Mar 2021, Honolulu / Virtual, United States. pp.131-141, ⟨10.1109/SANER50967.2021.00021⟩, SANER
- Accession number :
- edsair.doi.dedup.....adc23a0507b81b948cfd6c017c37634a