Machine Learning Models for Paraphrase Identification and its Applications on Plagiarism Detection

Ethan Hunt,Binay Dahal,Justin Zhan,Laxmi Gewali,Paul Y. Oh,Ritvik Janamsetty, Chanana Kinares, Chanel Koh, Alexis Sanchez,Felix Zhan, Murat Özdemir,Shabnam Waseem, Osman Yolcu

2019 IEEE International Conference on Big Knowledge (ICBK)(2019)

引用 26|浏览12
暂无评分
摘要
Paraphrase Identification or Natural Language Sentence Matching (NLSM) is one of the important and challenging tasks in Natural Language Processing where the task is to identify if a sentence is a paraphrase of another sentence in a given pair of sentences. Paraphrase of a sentence conveys the same meaning but its structure and the sequence of words varies. It is a challenging task as it is difficult to infer the proper context about a sentence given its short length. Also, coming up with similarity metrics for the inferred context of a pair of sentences is not straightforward as well. Whereas, its applications are numerous. This work explores various machine learning algorithms to model the task and also applies different input encoding scheme. Specifically, we created the models using Logistic Regression, Support Vector Machines, and different architectures of Neural Networks. Among the compared models, as expected, Recurrent Neural Network (RNN) is best suited for our paraphrase identification task. Also, we propose that Plagiarism detection is one of the areas where Paraphrase Identification can be effectively implemented.
更多
查看译文
关键词
Paraphrase Identification, Machine learning, Long Short Term Memory Networks, NLP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要