GigaBERT: A Bilingual BERT for English and Arabic

arXiv (Cornell University)(2020)

引用 0|浏览0
暂无评分
摘要
Arabic is a morphological rich language, posing many challenges for information extraction (IE) tasks, including Named Entity Recognition (NER), Part-of-Speech tagging (POS), Argument Role Labeling (ARL), and Relation Extraction (RE). A few multilingual pre-trained models have been proposed and show good performance for Arabic, however, most experiment results are reported on language understanding tasks, such as natural language inference, question answering and sentiment analysis. Their performance on the IE tasks is less known, in particular, the cross-lingual transfer capability from English to Arabic. In this work, we pre-train a Gigaword-based bilingual language model (GigaBERT) to study these two distant languages as well as zero-short transfer learning on various IE tasks. Our GigaBERT outperforms multilingual BERT and and monolingual AraBERT on these tasks, in both supervised and zero-shot learning settings.\footnote{We have made our pre-trained models publicly available at https://github.com/lanwuwei/GigaBERT
更多
查看译文
关键词
bilingual gigabert,english
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要