Guidelines for Online Network Crawling: A Study of Data Collection Approaches and Network Properties.

WebSci '18: 10th ACM Conference on Web Science Amsterdam Netherlands May, 2018(2018)

引用 7|浏览7
暂无评分
摘要
Over the past two decades, online social networks have attracted a great deal of attention from researchers. However, before one can gain insight into the properties or structure of a network, one must first collect appropriate data. Data collection poses several challenges, such as API or bandwidth limits, which require the data collector to carefully consider which queries to make. Many online network crawling methods have been proposed, but it is not always clear which method should be used for a given network. In this paper, we perform a detailed, hypothesis-driven analysis of several online crawling algorithms, ranging from classical crawling methods to modern, state-of-the-art algorithms, with respect to the task of collecting as much data (nodes or edges) as possible given a fixed query budget. We show that the performance of these methods depends strongly on the network structure. We identify three relevant network characteristics: community separation, average community size, and average node degree. We present experiments on both real and synthetic networks, and provide guidelines to researchers regarding selection of an appropriate sampling method.
更多
查看译文
关键词
Experiments, Online Sampling Algorithm, Network Crawling, Network Sampling, Complex Networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要