Chrome Extension
WeChat Mini Program
Use on ChatGLM

GPT-4 Performance on Querying Scientific Publications: Reproducibility, Accuracy, and Impact of an Instruction Sheet

Kaiming Tao, Zachary A. Osman,Philip L. Tzou,Soo-Yon Rhee, Vineet Ahluwalia,Robert W. Shafer

BMC Medical Research Methodology(2024)

Cited 0|Views11
No score
Abstract
Large language models (LLMs) that can efficiently screen and identify studies meeting specific criteria would streamline literature reviews. Additionally, those capable of extracting data from publications would enhance knowledge discovery by reducing the burden on human reviewers. We created an automated pipeline utilizing OpenAI GPT-4 32 K API version “2023–05-15” to evaluate the accuracy of the LLM GPT-4 responses to queries about published papers on HIV drug resistance (HIVDR) with and without an instruction sheet. The instruction sheet contained specialized knowledge designed to assist a person trying to answer questions about an HIVDR paper. We designed 60 questions pertaining to HIVDR and created markdown versions of 60 published HIVDR papers in PubMed. We presented the 60 papers to GPT-4 in four configurations: (1) all 60 questions simultaneously; (2) all 60 questions simultaneously with the instruction sheet; (3) each of the 60 questions individually; and (4) each of the 60 questions individually with the instruction sheet. GPT-4 achieved a mean accuracy of 86.9
More
Translated text
Key words
Large language model,HIV drug resistance,Systematic review,GPT-4,Data extraction
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined