Enhancing Hate Speech Detection with Fine-Tuned Large Language Models Requires High-Quality Data

Natalia Umansky,Maël Kubli,Karsten Donnay,Fabrizio Gilardi,Dominik Hangartner,Ana Kotarcic, Laura Bronner, Selina Kurer, Philip Grech

crossref(2024)

引用 0|浏览0
暂无评分
摘要
Efforts to curb online hate speech depend on our ability to reliably detect it at scale. Previous studies have highlighted the strong zero-shot classification performance of large-language models (LLMs), offering a potential tool to efficiently identify harmful content. Yet for complex and ambivalent tasks like hate speech detection, pre-trained LLMs can be insufficient and carry systemic biases. Domain-specific models, fine-tuned for the given task and empirical context could help address these issues but, as we demonstrate, the quality of data used for fine-tuning decisively matters. In this study, we fine-tuned GPT-3.5 using a unique corpus of online comments annotated by diverse groups of coders with varying annotation quality: research assistants, activists, two kinds of crowd workers, and citizen scientists. We find that only annotations from those groups of annotators that are better than zero-shot GPT-3.5 in recognizing hate speech improve the classification performance of the fine-tuned LLM. Specifically, fine-tuning using the two most high quality annotator groups -- research assistants and Prolific crowd workers -- boosts classification performance by increasing the model's precision without notably sacrificing the good recall of zero-shot GPT-3.5. In contrast, low quality annotations do not improve or even decrease the ability to identify hate speech.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要