A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
CoRR(2023)
摘要
This work conducts an evaluation of GPT-4V's multimodal capability for
medical image analysis, with a focus on three representative tasks of radiology
report generation, medical visual question answering, and medical visual
grounding. For the evaluation, a set of prompts is designed for each task to
induce the corresponding capability of GPT-4V to produce sufficiently good
outputs. Three evaluation ways including quantitative analysis, human
evaluation, and case study are employed to achieve an in-depth and extensive
evaluation. Our evaluation shows that GPT-4V excels in understanding medical
images and is able to generate high-quality radiology reports and effectively
answer questions about medical images. Meanwhile, it is found that its
performance for medical visual grounding needs to be substantially improved. In
addition, we observe the discrepancy between the evaluation outcome from
quantitative analysis and that from human evaluation. This discrepancy suggests
the limitations of conventional metrics in assessing the performance of large
language models like GPT-4V and the necessity of developing new metrics for
automatic quantitative analysis.
更多查看译文
关键词
imaging,multimodal capabilities
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要