MGAP3: Malware Group Attribution Based on PerceiverIO and Polytype Pre-Training

Yuxia Sun, Shiqi Chen, Song Lin, Aoxiang Sun,Saiqin Long,Zhetao Li

IEEE Transactions on Dependable and Secure Computing(2024)

引用 0|浏览0
The escalating prevalence of Advanced Persistent Threat (APT) malware demands more effective methods to accurately attribute malware to specific APT groups. Traditional manual attribution processes are labor-intensive and error-prone, while existing automated methods are hampered by small dataset sizes, inadequate representation learning, and poor noise reduction during preprocessing. To address these challenges, we introduce the AMG25 dataset, which expands the pool of malware samples labeled with APT group affiliations. Concurrently, we propose the MGAP 3 model (Malware Group Attribution based on PerceiverIO and Polytype Pre-training), which enhances attribution performance by incorporating hierarchical pre-training for disassembled codes and leveraging multi-view statistical features, all within a unified PerceiverIO architecture. This model adeptly captures complex program structures and interactions cross multiple code granularities, through a series of innovative polytype pre-training tasks. Additionally, we have developed a novel noise filtering technique that focuses on user-defined function codes, substantially reducing overfitting and boosting performance. Furthermore, a streamlined version of the model, MGAP 3 -Lite, has been developed to accelerate training while preserving robust performance. Extensive experiments have validated the effectiveness of our models and underscored the importance of the proposed pre-training technique.
APT malware,group attribution,PerceiverIO,pre-training task,static analysis
AI 理解论文
Chat Paper