SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media

NATURAL LANGUAGE ENGINEERING(2023)

引用 0|浏览4
暂无评分
摘要
Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. A hate speech wave swept social media platforms. The timely detection of Anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventive mechanisms but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address the problem of detecting Anti-Asian COVID-19-related hate speech from social media data. Previous approaches that tackled this problem used a transformer-based model, BERT/RoBERTa, trained on the homologous annotated dataset and achieved good performance on this task. However, this requires extensive and annotated datasets with a strong connection to the topic. Both goals are difficult to meet without employing reliable, vast, and costly resources. In this paper, we propose a robust semi-supervised model, SSL-GAN-RoBERTa, that learns from a limited heterogeneous dataset and whose performance is further enhanced by using vast amounts of unlabeled data from another related domain. Compared with the RoBERTa baseline model, the experimental results show that the model has substantial performance gains in terms of Accuracy and Macro-F1 score in different scenarios that use data from different domains. Our proposed model achieves state-of-the-art performance results while efficiently using unlabeled data, showing promising applicability to other complex classification tasks where large amounts of labeled examples are difficult to obtain.
更多
查看译文
关键词
Hate speech detection,Deep learning,Semi-supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要