EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter
arxiv(2024)
摘要
Nigerians have a notable online presence and actively discuss political and
topical matters. This was particularly evident throughout the 2023 general
election, where Twitter was used for campaigning, fact-checking and
verification, and even positive and negative discourse. However, little or none
has been done in the detection of abusive language and hate speech in Nigeria.
In this paper, we curated code-switched Twitter data directed at three
musketeers of the governorship election on the most populous and economically
vibrant state in Nigeria; Lagos state, with the view to detect offensive speech
in political discussions. We developed EkoHate – an abusive language and hate
speech dataset for political discussions between the three candidates and their
followers using a binary (normal vs offensive) and fine-grained four-label
annotation scheme. We analysed our dataset and provided an empirical evaluation
of state-of-the-art methods across both supervised and cross-lingual transfer
learning settings. In the supervised setting, our evaluation results in both
binary and four-label annotation schemes show that we can achieve 95.1 and 70.3
F1 points respectively. Furthermore, we show that our dataset adequately
transfers very well to three publicly available offensive datasets (OLID,
HateUS2020, and FountaHate), generalizing to political discussions in other
regions like the US.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要