Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models
arxiv(2023)
摘要
Online hate is an escalating problem that negatively impacts the lives of
Internet users, and is also subject to rapid changes due to evolving events,
resulting in new waves of online hate that pose a critical threat. Detecting
and mitigating these new waves present two key challenges: it demands
reasoning-based complex decision-making to determine the presence of hateful
content, and the limited availability of training samples hinders updating the
detection model. To address this critical issue, we present a novel framework
called HATEGUARD for effectively moderating new waves of online hate. HATEGUARD
employs a reasoning-based approach that leverages the recently introduced
chain-of-thought (CoT) prompting technique, harnessing the capabilities of
large language models (LLMs). HATEGUARD further achieves prompt-based zero-shot
detection by automatically generating and updating detection prompts with new
derogatory terms and targets in new wave samples to effectively address new
waves of online hate. To demonstrate the effectiveness of our approach, we
compile a new dataset consisting of tweets related to three recently witnessed
new waves: the 2022 Russian invasion of Ukraine, the 2021 insurrection of the
US Capitol, and the COVID-19 pandemic. Our studies reveal crucial longitudinal
patterns in these new waves concerning the evolution of events and the pressing
need for techniques to rapidly update existing moderation tools to counteract
them. Comparative evaluations against state-of-the-art tools illustrate the
superiority of our framework, showcasing a substantial 22.22
improvement in detecting the three new waves of online hate. Our work
highlights the severe threat posed by the emergence of new waves of online hate
and represents a paradigm shift in addressing this threat practically.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要