Using the Strings Metadata to Detect the Source Language of the Binary

Ashish Adhikari,Prasad A. Kulkarni

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INNOVATIONS IN COMPUTING RESEARCH (ICR'22)(2022)

引用 1|浏览3
暂无评分
摘要
We explore the question of determining the source language of program binaries. Programs compiled from different source languages are susceptible to different classes of software attacks. Therefore, knowing the source language can help the end-users assess the security risk of the binary software and allow human experts and automated binary analysis tools focus their efforts to adequately protect the software from the appropriate classes of attacks. Previous works in the related area of program provenance use complex analysis over the binary code to determine the compiler and flags used to generate the binary, but do not attempt to identify the source language of the binary. In this work we develop a simple approach that only uses the strings exposed by the binary to reliably determine the source language without requiring analysis of the underlying binary code, even when the binary is stripped. Our technique employs different machine-learning based classifiers over this simple program meta-data to accurately determine the source language over a large real-world benchmark set and 6 programming languages. We find that our simple approach can achieve an accuracy of over 98% to determine the source language over a large real-world benchmark set in all stripped and unstripped binary configurations.
更多
查看译文
关键词
Binary, Classification, Source language, Software attack, Stripped
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要