Quantifying Gender Bias in Arabic Pre-Trained Language Models.

IEEE Access(2024)

引用 0|浏览0
暂无评分
摘要
The current renaissance in the development of Arabic Pre-trained Language models (APLMs) has yielded significant advancement across many fields. Nevertheless, no study has explored the dimensions of gender bias in these models. It is argued that the bias is influenced by the resources used during the models’ pre-training process. Thus, in this study, we conducted a comprehensive analysis to qualitatively assess the representation of different genders by tracing the bias signals from the training corpus. Through applying several Natural Language Processing (NLP) techniques, including Named Entity Recognition (NER), Part of Speech Tagging (POS), and Dependency Parsing (DP), the results indicated an imbalanced corpus in terms of gender nouns and reveal verbs’ patterns associated with each gender. The second phase of this study aimed to examine the impact of the results that emerged from the corpus analysis on the recent APLMs. Leveraging Bidirectional Encoder Representations (BERT)’s ability to predict the missing tokens in quantifying gender bias, we introduce the first template-based Arabic benchmark designed to measure gender bias across various disciplines. Utilizing this benchmark, along with the list of gender-specific nouns and personal names extracted from the corpus, we evaluated the gender skew in the context of scientific and liberal arts disciplines across six APLMs. These models included: AraBERT, CAMeLBERT-CA, CAMeLBERT-MSA, GigaBERT, MAR-BERT, and ARBERT. The outcomes revealed a higher bias skew toward personal names, indicating that the presence of gender associations in the training corpus reinforced gender bias in APLMs.
更多
查看译文
关键词
Arabic Pretrained Language Models (APLMs),BERT,gender bias,large models,quantifying bias
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要