Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction
arxiv(2024)
摘要
In this work, we are interested in automated methods for knowledge graph
creation (KGC) from input text. Progress on large language models (LLMs) has
prompted a series of recent works applying them to KGC, e.g., via zero/few-shot
prompting. Despite successes on small domain-specific datasets, these models
face difficulties scaling up to text common in many real-world applications. A
principal issue is that in prior methods, the KG schema has to be included in
the LLM prompt to generate valid triplets; larger and more complex schema
easily exceed the LLMs' context window length. To address this problem, we
propose a three-phase framework named Extract-Define-Canonicalize (EDC): open
information extraction followed by schema definition and post-hoc
canonicalization. EDC is flexible in that it can be applied to settings where a
pre-defined target schema is available and when it is not; in the latter case,
it constructs a schema automatically and applies self-canonicalization. To
further improve performance, we introduce a trained component that retrieves
schema elements relevant to the input text; this improves the LLMs' extraction
performance in a retrieval-augmented generation-like manner. We demonstrate on
three KGC benchmarks that EDC is able to extract high-quality triplets without
any parameter tuning and with significantly larger schemas compared to prior
works.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要