Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects.

LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION(2013)

引用 26|浏览18
暂无评分
摘要
The Linguistic Data Consortium and Georgetown University Press are collaborating to create updated editions of bilingual dictionaries that had originally been published in the 1960's for English-speaking learners of Moroccan, Syrian and Iraqi Arabic. In their first editions, these dictionaries used ad hoc Latin-alphabet orthography for each colloquial Arabic dialect, but adopted some properties of Arabic-based writing (collation order of Arabic headwords, clitic attachment to word forms in example phrases); despite their common features, there are notable differences among the three books that impede comparisons across the dialects, as well as comparisons of each dialect to Modern Standard Arabic. In updating these volumes, we use both Arabic script and International Phonetic Alphabet orthographies; the former provides a common basis for word recognition across dialects, while the latter provides dialect-specific pronunciations. Our goal is to preserve the full content of the original publications, supplement the Arabic headword inventory with new usages, and produce a uniform lexicon structure expressible via the Lexical Markup Framework (LMF, ISO 24613). To this end, we developed a relational database schema that applies consistently to each dialect, and HTTP-based tools for searching, editing, workflow, review and inventory management.
更多
查看译文
关键词
Arabic dialect,bilingual dictionary,lexical markup
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要