DeTexD: A Benchmark Dataset for Delicate Text Detection

Artem Chernodub, Serhii Yavnyi, Oleksii Sliusarenko, Jade Razzaghi, Yichen Mo, Knar Hovakimyan

conf_acl(2023)

引用 1|浏览9
暂无评分
摘要
Over the past few years, much research has been conducted to identify and regulate toxic language. However, few studies have addressed a broader range of sensitive texts that are not necessarily overtly toxic. In this paper, we introduce and define a new category of sensitive text called “delicate text.” We provide the taxonomy of delicate text and present a detailed annotation scheme. We annotate DeTexD, the first benchmark dataset for delicate text detection. The significance of the difference in the definitions is highlighted by the relative performance deltas between models trained each definitions and corpora and evaluated on the other. We make publicly available the DeTexD Benchmark dataset, annotation guidelines, and baseline model for delicate text detection.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要