SUSPEND

Expert Systems with Applications: An International Journal(2017)

引用 5|浏览54
暂无评分
摘要
Software entropy is traditionally used for packer detection.Here, software entropy is represented as a non-stationary time series.Features are extracted using wavelets, change point models, and detrended fluctuation analysis.These features improve large-scale discrimination between malicious and clean files. Commercial anti-virus software traditionally memorizes specific byte sequences (known as \"signatures\") in the file contents of previously encountered malware. However, malware authors can evade signature-based detection in many ways; for instance, by using obfuscation techniques such as \"packing\" (encryption or compression) to hide snippets of malicious code; by writing metamorphic malware; or by tampering with existing malware. We hypothesize that certain evasion techniques can leave traces in the file's entropy signal, revealing either similarities to known malware or the presence of tampering per se. To this end, we present SUSPEND (SUSPicious ENtropy signal Detector), an expert system which evaluates the suspiciousness of an executable file's entropy signal in order to subserve malware classification. Whereas traditionally, entropy analysis has been used for the goal of packer detection (and therefore entropy-based features often merely comprise mean entropy or the entropy of a few file subcomponents), SUSPEND applies non-stationary time series modeling to aid in malware detection. In particular, SUSPEND (a) quantifies the \"amount of structure\" in the entropy signal (through detrended fluctuation analysis), (b) finds the location and size of sudden jumps in entropy (through mean change point modeling), and (c) computes the distribution of entropic variation across multiple spatial scales (through wavelet decomposition). In addition, SUSPEND (d) summarizes the entropy signal's empirical probability distribution. Because SUSPEND's run time can be made to scale linearly in file size, it is well-suited for large-scale malware analysis. We apply SUSPEND to a large-scale malware detection task with 500,000 heterogeneous real-world samples and over 1 million features. We find that SUSPEND boosts the predictive performance of traditional entropy analysis (as found in packer detectors) from 77.02% to 96.62%. Moreover, SUSPEND's focus on entropy signals makes it a natural candidate for combining with other types of features; for instance, combining SUSPEND with a strings-based feature set boosts predictive accuracy from 97.18% to 98.62%. Thus, whereas traditionally, entropy analysis has focused on detecting that a file is packed, SUSPEND's more comprehensive representation of the entropy signal helps to determine that a file is malicious. We illustrate the application of SUSPEND by studying 18 pieces of VirRansom, a family of viral ransomware which could cost millions to large organizations. SUSPEND is able to detect 100% of the studied files with over 99% confidence, whereas a more traditional strings-based model was very close to undecided and represents the entire family with a single string.
更多
查看译文
关键词
Malware,Machine learning,Time series,Wavelet,Change points,Detrended fluctuation analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要