Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction
WWW 2024(2024)
摘要
Document-level Relation Triplet Extraction (DocRTE) is a fundamental task in
information systems that aims to simultaneously extract entities with semantic
relations from a document. Existing methods heavily rely on a substantial
amount of fully labeled data. However, collecting and annotating data for newly
emerging relations is time-consuming and labor-intensive. Recent advanced Large
Language Models (LLMs), such as ChatGPT and LLaMA, exhibit impressive long-text
generation capabilities, inspiring us to explore an alternative approach for
obtaining auto-labeled documents with new relations. In this paper, we propose
a Zero-shot Document-level Relation Triplet Extraction (ZeroDocRTE) framework,
which generates labeled data by retrieval and denoising knowledge from LLMs,
called GenRDK. Specifically, we propose a chain-of-retrieval prompt to guide
ChatGPT to generate labeled long-text data step by step. To improve the quality
of synthetic data, we propose a denoising strategy based on the consistency of
cross-document knowledge. Leveraging our denoised synthetic data, we proceed to
fine-tune the LLaMA2-13B-Chat for extracting document-level relation triplets.
We perform experiments for both zero-shot document-level relation and triplet
extraction on two public datasets. The experimental results illustrate that our
GenRDK framework outperforms strong baselines.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要