Parsing Arabic with a Semi-automatically Generated TAG: Dealing with Linguistic Phenomena.

international conference on computational linguistics(2018)

引用 3|浏览15
暂无评分
摘要
Arabic is a challenging language when it comes to grammar production and parsing. It combines complex linguistic phenomena with a rich morphology that make its processing particularly ambiguous. This leaded us to choose the Tree-Adjoining Grammar (TAG) formalism. Indeed, TAG provides sufficient constraints for handling diverse linguistic phenomena and seems to be adequate to represent Arabic syntactic structures. In this paper, we present a semi-automatically generated TAG for modern standard Arabic using a compiler and a metagrammatical description language called XMG (eXtensible MetaGrammar). We describe the linguistic coverage of our grammar, and show how we used TAG and XMG’s properties to define in an expressive and concise way different linguistic phenomena. To check the coverage of our grammar, we have set up a development environment including a parser and using a test corpus of linguistic phenomena gathering both grammatical and ungrammatical sentences.
更多
查看译文
关键词
arabic,linguistic phenomena,tag,semi-automatically
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要