LFWE: Linguistic Feature Based Word Embedding for Hindi Fake News Detection

ACM Transactions on Asian and Low-Resource Language Information Processing(2023)

引用 0|浏览2
暂无评分
摘要
It is essential for research communities to investigate ways for authenticating news. The use of linguistic feature based analysis to automatically detect false news is gaining popularity among the scientific community. However, such techniques are exclusively created for English, leaving low-resource languages like Hindi behind. To address this issue, we constructed a novel annotated Hindi Fake News (HinFakeNews) dataset of roughly 33,300 articles that can be utilized to develop autonomous fake news detection systems. This work provides a two-stage benchmark model for identifying fake news in Hindi using machine learning. The proposed model, LFWE (Linguistic Feature Based Word Embedding), generates word embedding over linguistic features. This article focuses on 23 key linguistic features (15 extracted and 08 derived) for successful detection of Hindi fake news. These features are grouped as lexical, semantic, syntactic, psycho-linguistic, readability, and quantity features. The contribution is twofold. In the first phase, the dataset is preprocessed and linguistic features are extracted. In the second phase, feature sets are generated as word embeddings, and an Ensemble voting classification is carried out on the feature sets. According to experimental findings, the LFWE model accurately detects and classifies fake news in Hindi with an accuracy of 98.49%.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要