Efficiently Identifying Binary Similarity Based on Deep Hashing and Contrastive Learning
2023 8th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)(2023)
摘要
Binary similarity is to identify the semantic similarities of two or more binary code snippets. In recent years, deep learning-based methods have shown promising results. They formalize code similarity as the nearest neighbor retrieval task, and the overall workflow can be divided into two stages: 1) feeding the code snippets into the embedding model to get the corresponding high-dimensional vectors as fingerprints (i.e., constructing the codebase). 2) using the codebase for nearest neighbor retrieval to get the top-k results. Most existing studies only focus on the first stage (more specifically, the embedding model) while ignoring the overhead of the retrieval stage. In real-world scenarios, the codebase could be quite large and contain massive embeddings, which keeps the precise nearest neighbor retrieval prohibitive expensive. To mitigate the issue above, this paper proposes a novel approach, dubbed BinCH, which can efficiently perform code search without sacrificing accuracy.
更多查看译文
关键词
Big Data,Reverse Engineering,Deep Learning,Deep Hashing,Binary Diffing
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要