BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning
CoRR(2023)
摘要
A timely software update is vital to combat the increasing security
vulnerabilities. However, some software vendors may secretly patch their
vulnerabilities without creating CVE entries or even describing the security
issue in their change log. Thus, it is critical to identify these hidden
security patches and defeat potential N-day attacks. Researchers have employed
various machine learning techniques to identify security patches in open-source
software, leveraging the syntax and semantic features of the software changes
and commit messages. However, all these solutions cannot be directly applied to
the binary code, whose instructions and program flow may dramatically vary due
to different compilation configurations. In this paper, we propose BinGo, a new
security patch detection system for binary code. The main idea is to present
the binary code as code property graphs to enable a comprehensive understanding
of program flow and perform a language model over each basic block of binary
code to catch the instruction semantics. BinGo consists of four phases, namely,
patch data pre-processing, graph extraction, embedding generation, and graph
representation learning. Due to the lack of an existing binary security patch
dataset, we construct such a dataset by compiling the pre-patch and post-patch
source code of the Linux kernel. Our experimental results show BinGo can
achieve up to 80.77% accuracy in identifying security patches between two
neighboring versions of binary code. Moreover, BinGo can effectively reduce the
false positives and false negatives caused by the different compilers and
optimization levels.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要