Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information.

Inf. Process. Manage.(2016)

引用 39|浏览79
暂无评分
摘要
Proposing a candidate retrieval model for cross-lingual plagiarism detectionThe method relies on using two levels of proximity informationProposing a topic-based text segmentation methodComparing the method with other cross-lingual plagiarism detection approachesShowing improvements using text segmentation and positional language models The rapid growth of documents in different languages, the increased accessibility of electronic documents, and the availability of translation tools have caused cross-lingual plagiarism detection research area to receive increasing attention in recent years. The task of cross-language plagiarism detection entails two main steps: candidate retrieval and assessing pairwise document similarity. In this paper we examine candidate retrieval, where the goal is to find potential source documents of a suspicious text. Our proposed method for cross-language plagiarism detection is a keyword-focused approach. Since plagiarism usually happens in parts of the text, there is a requirement to segment the texts into fragments to detect local similarity. Therefore we propose a topic-based segmentation algorithm to convert the suspicious document to a set of related passages. After that, we use a proximity-based model to retrieve documents with the best matching passages. Experiments show promising results for this important phase of cross-language plagiarism detection.
更多
查看译文
关键词
Candidate document retrieval,Cross-language plagiarism detection,Text segmentation,Proximity-based retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要