GuoFeng: A Discourse-aware Evaluation Benchmark for Language Understanding, Translation and Generation

ICLR 2023(2023)

引用 0|浏览60
暂无评分
摘要
Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental and challenging problem in natural language processing (NLP). However, existing evaluation benchmarks mainly focus on the evaluation of inter-sentence properties and overlook important discourse phenomena that cross sentences. To bridge the gap, we propose a GuoFeng benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. GuoFeng consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also propose a diagnostic test suite that can examine whether the target models learn discourse knowledge. We evaluate 17 general- and in-domain models based on Transformer and advanced pre-training architectures, showing that fine-grained pretraining based on document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field.
更多
查看译文
关键词
Discourse,Evaluation Benchmark,Pre-trained Models,Natural Language Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要