Context-Augmented Key Phrase Extraction from Short Texts for Cyber Threat Intelligence Tasks
2023 IEEE International Conference on Intelligence and Security Informatics (ISI)(2023)
摘要
In this paper, we address contextual limitations of current deep learning-based and heuristic key phrase extraction tools as applied to the domain of cybersecurity. To address these limitations, we develop a hybrid system that augments state-of-the-art (SOTA) transformers for the task of key phrase sequence labeling, using a novel set of part-of-speech (POS) and role-aware tagging rules to generate fine-grained tag sequences from short text corpora. Next, we fine-tune multiple SOTA deep learning (DL) language model (LM) architectures to these transformed sequences. We then evaluate the architectures by measuring the outcomes from respective LMs to select the best-performing underlying transformers for extracting cybersecurity key phrases. This new ensemble achieves very significant predictive gains over SOTA baselines on general cybersecurity corpora, such as F1 scores at least 25% higher than hybrid SOTA transformers fine-tuned using baseline tagging rules on the generic corpus, with a much less significant tradeoff (of less than 5% in F1) on a vulnerability-specific corpus.
更多查看译文
关键词
Tagging rules,sequence labeling,BERT,BiLSTM,ROUGE score,context,key phrase,cyber threat
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要