KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models.

Yuyang Bai,Shangbin Feng,Vidhisha Balachandran,Zhaoxuan Tan,Shiqi Lou,Tianxing He,Yulia Tsvetkov

WWW '24 Proceedings of the ACM on Web Conference 2024（2024）

Cited 0|Views33

No score

Abstract

Large language models (LLMs) demonstrate remarkable performance onknowledge-intensive tasks, suggesting that real-world knowledge is encoded intheir model parameters. However, besides explorations on a few probing tasks inlimited knowledge domains, it is not well understood how to evaluate LLMs'knowledge systematically and how well their knowledge abilities generalize,across a spectrum of knowledge domains and progressively complex task formats.To this end, we propose KGQuiz, a knowledge-intensive benchmark tocomprehensively investigate the knowledge generalization abilities of LLMs.KGQuiz is a scalable framework constructed from triplet-based knowledge, whichcovers three knowledge domains and consists of five tasks with increasingcomplexity: true-or-false, multiple-choice QA, blank filling, factual editing,and open-ended knowledge generation. To gain a better understanding of LLMs'knowledge abilities and their generalization, we evaluate 10 open-source andblack-box LLMs on the KGQuiz benchmark across the five knowledge-intensivetasks and knowledge domains. Extensive experiments demonstrate that LLMsachieve impressive performance in straightforward knowledge QA tasks, whilesettings and contexts requiring more complex reasoning or employingdomain-specific facts still present significant challenges. We envision KGQuizas a testbed to analyze such nuanced variations in performance across domainsand task formats, and ultimately to understand, evaluate, and improve LLMs'knowledge abilities across a wide spectrum of knowledge domains and tasks.

Translated text

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined