Texception: A Character/Word-Level Deep Learning Model For Phishing Url Detection

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 60|浏览30
暂无评分
摘要
Phishing is the starting point for many cyberattacks that threaten the confidentiality, availability and integrity of enterprises' and consumers' data. The URL of a web page that hosts the attack provides a rich source of information to determine the maliciousness of the web server. In this work, we propose a novel deep learning architecture, Texception, that takes a URL as input and predicts whether it belongs to a phishing attack. Architecturally, Texception uses both character-level and word-level information from the incoming URL and does not depend on manually crafted features or feature engineering. This makes it different from classical approaches. In addition, Texception benefits from multiple parallel convolutional layers and can grow deeper or wider. We show that this flexibility enables Texception to generalize better for new URLs. Our results on production data show that Texception is able to significantly outperform a traditional text classification method by increasing the true positive rate by 126.7% at an extremely low false positive rate (0.01%) which is crucial for our model's healthy operation at internet scale.
更多
查看译文
关键词
Phishing, Detection, Character-Level, Deep Learning, Word Embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要