MCER: A Multi-domain Dataset for Sentence-Level Chinese Ellipsis Resolution

Natural Language Processing and Chinese Computing(2022)

引用 1|浏览10
暂无评分
摘要
Ellipsis is a cross-linguistic phenomenon which can be commonly seen in Chinese. Although eliding some of the elements in the sentence that could be understood from the context makes no difference for human beings, it is a great challenge for machine in the procedure of natural language understanding. In order to promote ellipsis-related researches in Chinese language, we propose an application-oriented definition of ellipsis specifically for researches in the realm of Chinese natural language processing. At the same time, we build and release a multi-domain dataset for sentence-level Chinese ellipsis resolution following the new definition we propose. In addition, we define a new task: sentence-level Chinese ellipsis resolution, and model it with two subprocedures: 1) Elliptic position detection; 2) Ellipsis resolution. We propose several baseline methods based on pre-trained language models, as they have obtained state-of-the-art results on related tasks. Besides, it is also worth noticing that, to our knowledge, this is the first study that apply the extractive method for question answering to Chinese ellipsis resolution. The results of the experiments show that it is possible for machine to understand ellipsis within our new definition.
更多
查看译文
关键词
Definition of ellipsis, Elliptic position detection, Ellipsis resolution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要