Protecting Web Contents Against Persistent Distributed Crawlers

2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC)(2017)

引用 27|浏览11
暂无评分
摘要
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator. In this paper, based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviors, we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers. For each URL, by adding a marker to record its parent page that leads to the access to this URL and the user identity who accesses this URL, we can not only perform more accurate heuristic detection and Support Vector Machine (SVM) based machine learning detection to detect malicious crawlers at an earlier stage, but also dramatically suppress the efficiency of crawlers before they are detected. We deploy our approach on a forum website, and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers.
更多
查看译文
关键词
Web content protection,persistent distributed crawlers,Web crawlers,PathMarker anti-crawler mechanism,heuristic detection,support vector machine,SVM based machine learning detection,malicious crawler detection,Web site,open-source crawlers,in-house crawlers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要