Investigating Gender Bias in Turkish Language Models
arxiv(2024)
摘要
Language models are trained mostly on Web data, which often contains social
stereotypes and biases that the models can inherit. This has potentially
negative consequences, as models can amplify these biases in downstream tasks
or applications. However, prior research has primarily focused on the English
language, especially in the context of gender bias. In particular,
grammatically gender-neutral languages such as Turkish are underexplored
despite representing different linguistic properties to language models with
possibly different effects on biases. In this paper, we fill this research gap
and investigate the significance of gender bias in Turkish language models. We
build upon existing bias evaluation frameworks and extend them to the Turkish
language by translating existing English tests and creating new ones designed
to measure gender bias in the context of Türkiye. Specifically, we also
evaluate Turkish language models for their embedded ethnic bias toward Kurdish
people. Based on the experimental results, we attribute possible biases to
different model characteristics such as the model size, their multilingualism,
and the training corpora. We make the Turkish gender bias dataset publicly
available.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要