Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change

SSRN Electronic Journal(2022)

引用 2|浏览2
暂无评分
摘要
Abstract Deception in computer-mediated communication represents a threat and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful in detecting lexical patterns related to deception. In this study, four selected machine learning models are trained and tested on data collected through a crowdsourcing platform on the topics of Covid-19 and climate change. The performance of the models was tested by analyzing n-grams (from unigrams to trigrams), and by using psycho-linguistic analysis. A selection of important features was carried out and further deepened by additional testing of the models on different subsets of the obtained features. The developed models were tested using own and alternative data in order to examine their applicability. The performance of the models trained on combined data are examined, to gain insight into the possibility of generalization and models’ applicability to different datasets. This study concludes that the domain of the collected data, more precisely the subjectivity of the collected data topic, greatly affects the performance of machine learning models in detecting hidden linguistic features of deception. The psycho-linguistic analysis alone and in combination with n-grams achieves better classification results than n-gram analysis while testing the models on own data, but also while examining the possibility of generalization, especially on trigrams where the combined approach achieves notably higher accuracy. The n-gram analysis proved to be a more robust method during the testing of the mutual applicability of the models, while psycho-linguistic analysis remained most inflexible.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要