MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS(2021)

引用 57|浏览21
暂无评分
摘要
Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language-independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of stateof-the-art-methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.(1)
更多
查看译文
关键词
linguistically diverse benchmark,multilingual open domain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要