Automated Vulnerable Codes Mutation through Deep Learning for Variability Detection

IEEE International Joint Conference on Neural Network (IJCNN)(2022)

引用 1|浏览2
At present, there are many studies on the automatic generation or mutation of common code dataset, but the research on the mass generation of vulnerable code dataset has received little public attention. Wanting to generate more vulnerable code seems counterproductive, but it is very important to vulnerability detection technology. It can be used to discover the blind spots of vulnerability detection tools through fuzzing testing technology. In particular, the use of machine learning and deep learning techniques for vulnerability detection has highly dataset imbalance problem due to the lack of vulnerable codes, which seriously affects the performance of the vulnerability detection model. In this paper, we propose a new vulnerable code mutation technique called Vuls-Mutation. Based on a generative Sequence-to-Sequence model, our system automatically and continuously mutate to generate new vulnerable code, by changing the control flow or data flow of the potentially tainted data in the existing vulnerable code. Experiments show that the grammatical correctness rate of the mutated code is about 71% and the true positive rate of the mutated code based on the correct grammar is about 93%. We add this new set of mutation programs to train deep learning vulnerability detection models, and the results show that all indicators are better than the baseline method.
Deep Learning,Vulnerability Code,Code Mutation,Program Slicing,Vulnerability Detection
AI 理解论文
Chat Paper