DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)(2022)

引用 2|浏览25
暂无评分
摘要
Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).
更多
查看译文
关键词
Binary Analysis,Compilation Options
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要