Context-Augmented Key Phrase Extraction from Short Texts for Cyber Threat Intelligence Tasks

Avishek Bose,Huichen Yang, Marissa Shivers, Ahat Orazgeldiyev,William H. Hsu

2023 IEEE International Conference on Intelligence and Security Informatics (ISI)(2023)

引用 0|浏览7
In this paper, we address contextual limitations of current deep learning-based and heuristic key phrase extraction tools as applied to the domain of cybersecurity. To address these limitations, we develop a hybrid system that augments state-of-the-art (SOTA) transformers for the task of key phrase sequence labeling, using a novel set of part-of-speech (POS) and role-aware tagging rules to generate fine-grained tag sequences from short text corpora. Next, we fine-tune multiple SOTA deep learning (DL) language model (LM) architectures to these transformed sequences. We then evaluate the architectures by measuring the outcomes from respective LMs to select the best-performing underlying transformers for extracting cybersecurity key phrases. This new ensemble achieves very significant predictive gains over SOTA baselines on general cybersecurity corpora, such as F1 scores at least 25% higher than hybrid SOTA transformers fine-tuned using baseline tagging rules on the generic corpus, with a much less significant tradeoff (of less than 5% in F1) on a vulnerability-specific corpus.
Tagging rules,sequence labeling,BERT,BiLSTM,ROUGE score,context,key phrase,cyber threat
AI 理解论文
Chat Paper