Progression of Large Language Models for Clinical Decision Support: An Evaluation for Rare and Frequent Diseases using GPT-3.5, GPT 4 and Naïve Google Search

Research Square (Research Square)(2023)

Cited 0|Views3
No score
Abstract
Abstract Large Language Models (LLMs) like ChatGPT have become increasingly prevalent. Even without medical approval, people will use it to seek health advice, much like searching for diagnoses on Google. We performed a systematic analysis of GPT-3·5 and GPT-4 for suggesting diagnosis, examination steps and treatment of newly processed 110 medical case reports from different clinical disciplines. Balanced groups of rare, less frequent and frequent diseases were used as input. For the diagnosis task a naïve Google search was performed as benchmark comparison. Performance was assessed by two independent physicians using a 5-point Likert scale. The results showed superior performance of GPT-4 over GPT-3·5 considering diagnosis and examination and superior performance over Google for diagnosis. With the exception of treatment, better performance on frequent vs rare diseases was evident for all approaches. In conclusion, the LLMs showed growing potential for medical question answering in two successive major releases. However, several weaknesses and challenges necessitate the utilization of quality-controlled and regulated types of AI-models to qualify as medical applications.
More
Translated text
Key words
clinical decision support,large language models,decision support
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined