PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering
CoRR(2024)
摘要
Existing work on Temporal Question Answering (TQA) has predominantly focused
on questions anchored to specific timestamps or events (e.g. "Who was the US
president in 1970?"). Little work has studied questions whose temporal context
is relative to the present time (e.g. "Who was the previous US president?"). We
refer to this problem as Present-Anchored Temporal QA (PATQA). PATQA poses
unique challenges: (1) large language models (LLMs) may have outdated
knowledge, (2) complex temporal relationships (e.g. 'before', 'previous') are
hard to reason, (3) multi-hop reasoning may be required, and (4) the gold
answers of benchmarks must be continuously updated. To address these
challenges, we introduce the PAT-Questions benchmark, which includes single and
multi-hop temporal questions. The answers in PAT-Questions can be automatically
refreshed by re-running SPARQL queries on a knowledge graph, if available. We
evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model
(TEMPREASON-T5) on PAT-Questions through direct prompting and
retrieval-augmented generation (RAG). The results highlight the limitations of
existing solutions in PATQA and motivate the need for new methods to improve
PATQA reasoning capabilities.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要