Enhancing Urdu Intrinsic Plagiarism Detection Through Stylometry Features and Machine Learning

Muhammad Faraz Manzoor, Muhammad Shoaib Farooq, Muhammad Haseeb, Uzma Farooq,Adnan Abid, Ahmer Saeed

2023 25th International Multitopic Conference (INMIC)(2023)

引用 0|浏览0
暂无评分
摘要
The creation of digital content and the easy accessibility of information have led to a surge in academic and textual plagiarism. Plagiarism detection in multiple languages is essential to maintain the integrity of academic and literary works. In the context of the Urdu language, there is a growing need for effective plagiarism detection methods that are tailored to its unique linguistic characteristics. Existing Urdu plagiarism detection tools often rely on external sources or lack robustness in handling intrinsic forms of plagiarism, where the copied content is slightly modified or paraphrased. This research aims to bridge this gap by developing an intrinsic plagiarism detection system for the Urdu language, using a combination of machine learning, ensemble learning and Multi-Layer Perceptron (MLP). Furthermore, to train and evaluate our plagiarism detection models, we manually curate a corpus comprising a substantial collection of 1807 documents in Urdu. This corpus forms the foundation of our research, enabling us to develop and fine-tune our detection algorithms to effectively identify instances of intrinsic plagiarism in Urdu text. To comprehensively assess the unique stylistic fingerprints of documents, we employ a diverse set of word based stylometry features. This multifaceted approach enhances our ability to pinpoint instances of plagiarism in a robust manner. This research contributes to the ongoing efforts to combat plagiarism and uphold the integrity of written content, particularly in the context of the Urdu language, while also showcasing the effectiveness of different word based stylometry features in addressing this critical issue.
更多
查看译文
关键词
Machine Learning,Authorship Attribution,Plagiarism Detection,Unique Features,Multilayer Perceptron,Multiple Languages,Combination Of Machine Learning,Logistic Regression,Neural Network,Model Performance,Convolutional Neural Network,Artificial Neural Network,Valuable Insights,Machine Learning Models,Machine Learning Techniques,Patterns In Data,K-nearest Neighbor,Recurrent Neural Network,Precision And Recall,Use Of Features,Ensemble Learning Techniques,Multilayer Perceptron Neural Network,Complex Patterns In Data,Ensemble Technique,Ensemble Learning Method,Low Precision,Term Frequency,Training Set,Similar Themes,Training Data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要