Investigating the Effect of Machine-Translation on Automated Classification of Toxic Comments

James Roy, Siddhi Suresh, Mohamed Elsayed, Ronie Rocca,Ziqian Dong,Huanying Gu,N. Sertac Artan

2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS)(2022)

引用 0|浏览0
暂无评分
摘要
This paper discusses the research findings on the performance of automated toxic comment classification following machine translation. We tested Google Perspective API first on comments from non-English Wikipedia talk pages in five languages, and then on their English translation (generated with Google's Cloud Translate API). In addition to giving baselines on the current performance of Perspective in five languages, this allows for comparison on how machine-translation alters the classification. We show that the level of disagreement between pre- and post-translation classification is heavily dependent on the language used. The comments come from a Kaggle dataset and we filter them to ensure monolingual comments with simple punctuation. Results show above 84% of the French, Italian and Spanish comments received the same class pre- and post-translation, while Portuguese and Russian performed the worst of the five languages tested, with F-scores below 0.6.
更多
查看译文
关键词
Toxic Comment Detection,Machine Translation,Perspective API
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要