Chrome Extension
WeChat Mini Program
Use on ChatGLM

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models.

WWW '24 Proceedings of the ACM on Web Conference 2024(2024)

Cited 0|Views33
No score
Abstract
Large language models (LLMs) demonstrate remarkable performance onknowledge-intensive tasks, suggesting that real-world knowledge is encoded intheir model parameters. However, besides explorations on a few probing tasks inlimited knowledge domains, it is not well understood how to evaluate LLMs'knowledge systematically and how well their knowledge abilities generalize,across a spectrum of knowledge domains and progressively complex task formats.To this end, we propose KGQuiz, a knowledge-intensive benchmark tocomprehensively investigate the knowledge generalization abilities of LLMs.KGQuiz is a scalable framework constructed from triplet-based knowledge, whichcovers three knowledge domains and consists of five tasks with increasingcomplexity: true-or-false, multiple-choice QA, blank filling, factual editing,and open-ended knowledge generation. To gain a better understanding of LLMs'knowledge abilities and their generalization, we evaluate 10 open-source andblack-box LLMs on the KGQuiz benchmark across the five knowledge-intensivetasks and knowledge domains. Extensive experiments demonstrate that LLMsachieve impressive performance in straightforward knowledge QA tasks, whilesettings and contexts requiring more complex reasoning or employingdomain-specific facts still present significant challenges. We envision KGQuizas a testbed to analyze such nuanced variations in performance across domainsand task formats, and ultimately to understand, evaluate, and improve LLMs'knowledge abilities across a wide spectrum of knowledge domains and tasks.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined