Efficiently Identifying Binary Similarity Based on Deep Hashing and Contrastive Learning

2023 8th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)(2023)

引用 0|浏览40
暂无评分
摘要
Binary similarity is to identify the semantic similarities of two or more binary code snippets. In recent years, deep learning-based methods have shown promising results. They formalize code similarity as the nearest neighbor retrieval task, and the overall workflow can be divided into two stages: 1) feeding the code snippets into the embedding model to get the corresponding high-dimensional vectors as fingerprints (i.e., constructing the codebase). 2) using the codebase for nearest neighbor retrieval to get the top-k results. Most existing studies only focus on the first stage (more specifically, the embedding model) while ignoring the overhead of the retrieval stage. In real-world scenarios, the codebase could be quite large and contain massive embeddings, which keeps the precise nearest neighbor retrieval prohibitive expensive. To mitigate the issue above, this paper proposes a novel approach, dubbed BinCH, which can efficiently perform code search without sacrificing accuracy.
更多
查看译文
关键词
Big Data,Reverse Engineering,Deep Learning,Deep Hashing,Binary Diffing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要