To Link or Not to Link: Ranking Hyperlinks in Wikipedia using Collective Attention

Jaroslav Cechak, Philip Thruesen, Blandine Seznec, Roel Castano, Nattiya Kanhabua

BigData(2016)

引用 24|浏览48
暂无评分
摘要
Wikipedia is one of the fastest growing websites and a primary source of knowledge on the Internet. Being a wiki, its content is crowd-sourced by the users. This has many benefits and it is one of the main reasons it has grown to reach more than 5 million articles in its English version. Nevertheless, this also raises issues, like the overlinking of articles, which are difficult to deal with by editors. In this paper, we tackle overlinking in Wikipedia as a ranking problem. We apply Learning to Rank algorithms to evaluate the click frequency of links in an effort to distinguish the most useful links for users. To accomplish this, we develop a ground truth, which serves as baseline for our algorithm and compare hyperlink features to implement the most advantageous ones. The results show 86.2% accuracy with the top-6 most useful features and 87.7% accuracy with the complete feature set. Considering these results, we outline a solution to the overlinking problem. By removing the most inadequate links, we suggest that readability of Wikipedia articles could be improved while preserving most of its useful links.
更多
查看译文
关键词
Wikipedia,Useful link,Accuracy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要