Assessing the Quality of an Italian Crowdsourced Idiom Corpus: the Dodiom Experiment.

Giuseppina Morza,Raffaele Manna,Johanna Monti

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览7
暂无评分
摘要
This paper focuses on the evaluation of linguistic data, concerning idioms examples collected and annotated through Dodiom, a GWAP environment, by Italian linguists. The paper provides an insight into the Dodiom project, the data collection through the contribution of the crowd, and, finally it specifically describes the annotation criteria used by the experts to estimate the quality of the collected data. The main scope of this paper is, indeed, the evaluation of the quality of the linguistic data obtained through crowdsourcing, namely to assess if the data provided by the players who joined the game are eligible and profitable for research and teaching purposes. This task concerns the development of a collection of idioms, namely a specific type of Multiword expressions which is usually hard to find in corpora and that contains words that may also be used in their literal meanings within a sentence. This is particularly important as these data may be used both for the training and the evaluation of NLP applications. Finally, results, as well as future work, are presented.
更多
查看译文
关键词
crowdsourcing, data quality, idioms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要