Plum2Text - a french plumitifs-descriptions data-to-text dataset for natural language generation.

ICAIL(2021)

引用 1|浏览3
暂无评分
摘要
In this paper, we introduce a new French Data-to-Text (D2T) dataset in the legal domain: Plum2Text1. It is made out of plumitifs (docket files) - descriptions pairs that are derived from publicly available documents issued by Canadian criminal courts. The development of Plum2Text is primarily intended to train statistical natural language generation algorithms, in order to make the plumitifs more easily understandable for Canadian citizens. The inputs and outputs of the dataset are unique: on the data side, the values of the table contain long pieces of textual utterance, and on the text side (or reference), it most often consists of a paraphrase of the table values. We describe how we curated the plumitif-description associations by introducing an annotation tool and a methodology specific to the D2T natural language generation task. We do so by using simple yet efficient text classifiers to help the annotator leverage annotated examples during the annotation process. As a matter of privacy, we also illustrate how we are decontextualizing the descriptions.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要