BinCC: Scalable Function Similarity Detection in Multiple Cross-Architectural Binaries.

IEEE Access(2022)

引用 0|浏览12
暂无评分
摘要
With the undeniable increase in popularity of open source software, also the availability and reuse of source code have increased. While the detection of code clones helps tracking reuse and evolution while dealing with source code, little prior work exists that can be used in binary code. This is complicated by the increased difficulty posed by the compilation transformations. In this paper, we present a CFG refinement useful to find function-level clones in a fast and scalable way by comparing the high-level structure of multiple disassembled binaries altogether. We are capable of determining if functions belonging to other programs have been copied or reused, even when the processor architecture is different. Specifically, our algorithm consists in the extraction of the various functions flows and the reconstruction of a higher level structure, leveraging architectural differences and allowing efficient comparison in linear time with structural hashing. We implemented our idea in a tool called BinCC, and analyzed 24 million functions spanning different architectures and optimization levels. Results show that our approach can achieve precision between 91% and 99% within the same architecture and 75% in detecting clones among different architectures, and can also detect the presence of specific library functions inside an executable. Our approach can reach comparable precision of current state-of-the-art learning approaches while being three order of magnitude faster.
更多
查看译文
关键词
Training,Source coding,Cloning,Computer architecture,Binary codes,Libraries,Distance measurement,Code clones,static code analysis,reverse engineering,compilers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要