A Generative Adversarial Network For Data Augmentation: The Case Of Arabic Regional Dialects

Xavier A. Carrasco,Ashraf Elnagar,Mohammed Lataifeh

AI IN COMPUTATIONAL LINGUISTICS(2021)

引用 6|浏览0
暂无评分
摘要
Text Generation using Generative Adversarial Networks (GANs) has been successful in domains such as sentiment analysis using Sentimental GAN (SentiGAN) model. We adopt a similar model to generate sentences for five regional Arabic dialects (Egypt, Gulf, Maghreb, Levant, and Iraq). The objective is to overcome the scarcity of richly annotated Dialectal Arabic (DA) datasets by automatic generation of such corpora. The DA generation process for a specific dialect, relies on a generator to create new text, and a discriminator to evaluate that text, with a dynamic update that will allow the process to run automatically without supervision. Novelty and diversity are the two metrics used to verify the consistency and quality of the generated DA text before enriching the sought datasets. Experimental results confirm the reliability and value of the generated datasets when tested by different classifiers. (C) 2021 The Authors. Published by Elsevier B.V.
更多
查看译文
关键词
Generative Adversarial Networks, Dialectal Arabic, Dataset Augmentation, Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要