One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

CoRR(2021)

Cited 0|Views10
No score
Abstract
Binary2source code matching is critical to many code-reuse-related tasks, including code clone detection, software license violation detection, and reverse engineering assistance. Existing binary2source works always apply a "1-to-1" (one-to-one) mechanism, i.e., one function in a binary file is matched against one function in a source file. However, we assume that such mapping is usually a more complex problem of "1-to-n" (one-to-many) due to the existence of function inlining. To the best of our knowledge, few existing works have systematically studied the effect of function inlining on binary2source matching tasks. This paper will address this issue. To support our study, we first construct two datasets containing 61,179 binaries and 19,976,067 functions. We also propose an automated approach to label the dataset with line-level and function-level mapping. Based on our labeled dataset, we then investigate the extent of function inlining, the factors affecting function inlining, and the impact of function inlining on existing binary2source similarity methods. Finally, we discuss the interesting findings and give suggestions for designing more effective methodologies.
More
Translated text
Key words
binary2source similarity analysis,one-to-one,one-to-many
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined