Examining the Generalizability of English Cyberbullying Detection Models on Malay Informal Text Using Direct Translation.

Shu Xian Chew,Jasy Suet Yan Liew, Wan Ahmad Luqman Wan Ibrisam Fikry, Noor Farizah Ibrahim

Int. J. Asian Lang. Process.(2022)

引用 0|浏览5
暂无评分
摘要
As cyberbullying on social media platforms becomes more rampant in Malaysia, there is a need for automatic cyberbullying detection models that can handle text in the local Malay language. Although Malay is widely used in Malaysia, it remains a low resource language particularly the availability of high-quality Malay cyberbullying corpora required to train machine learning models to effectively identify cyberbullying in the local context. Our study explores the possibility of borrowing from an existing cyberbullying corpus (Formspring with 13,153 posts) in a resource rich language to train an English cyberbullying detection model, and then evaluate the performance of the model on a test set containing 3663 Malay WhatsApp messages translated to English. By using direct translation, we reveal that the model performance greatly relies on the quality and accuracy of the English-to-Malay translation, a problem that is exacerbated by many informal Malay expressions and slangs in WhatsApp messages.
更多
查看译文
关键词
english cyberbullying detection models,malay informal text
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要