UHated: hate speech detection in Urdu language using transfer learning

LANGUAGE RESOURCES AND EVALUATION(2023)

引用 2|浏览5
暂无评分
摘要
Social media has become a driving force for social change in the global society. Events that take place in one part of the world can quickly reverberate across the globe due to the vast amount of data generated on these platforms. However, developers of these platforms face numerous challenges in keeping cyberspace as inclusive and healthy as possible. In recent years, there has been an increase in offensive and hate speech on social media. Manual efforts to address this issue have been inadequate due to the vast scope of the problem. Therefore, there is a need for an automated technique that can detect and remove offensive and hateful comments before they can cause harm. In this research, we use transfer learning to utilize pre-trained FastText Urdu word embeddings and multi-lingual BERT embeddings (RoBERTa) for our task. We also develop an Urdu language hate lexicon and use it to create an annotated dataset of 7800 Urdu tweets. Our results show that RoBERTa is able to achieve a macro F1-score of 0.82 on our multi-class classification task, outperforming deep learning and machine learning baseline models.
更多
查看译文
关键词
Hate speech detection,Deep learning,Language semantics,Twitter,Social network analysis,Low-resource languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要