GreenLLaMA: A Framework for Detoxification with Explanations
CoRR(2024)
摘要
Prior works on detoxification are scattered in the sense that they do not
cover all aspects of detoxification needed in a real-world scenario. Notably,
prior works restrict the task of developing detoxification models to only a
seen subset of platforms, leaving the question of how the models would perform
on unseen platforms unexplored. Additionally, these works do not address
non-detoxifiability, a phenomenon whereby the toxic text cannot be detoxified
without altering the meaning. We propose GreenLLaMA, the first comprehensive
end-to-end detoxification framework, which attempts to alleviate the
aforementioned limitations. We first introduce a cross-platform pseudo-parallel
corpus applying multi-step data processing and generation strategies leveraging
ChatGPT. We then train a suite of detoxification models with our cross-platform
corpus. We show that our detoxification models outperform the SoTA model
trained with human-annotated parallel corpus. We further introduce explanation
to promote transparency and trustworthiness. GreenLLaMA additionally offers a
unique paraphrase detector especially dedicated for the detoxification task to
tackle the non-detoxifiable cases. Through experimental analysis, we
demonstrate the effectiveness of our cross-platform corpus and the robustness
of GreenLLaMA against adversarial toxicity.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要