MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
arxiv(2024)
摘要
Medical Visual Question Answering (MedVQA), which offers language responses
to image-based medical inquiries, represents a challenging task and significant
advancement in healthcare. It assists medical experts to swiftly interpret
medical images, thereby enabling faster and more accurate diagnoses. However,
the model interpretability and transparency of existing MedVQA solutions are
often limited, posing challenges in understanding their decision-making
processes. To address this issue, we devise a semi-automated annotation process
to streamlining data preparation and build new benchmark MedVQA datasets R-RAD
and R-SLAKE. The R-RAD and R-SLAKE datasets provide intermediate medical
decision-making rationales generated by multimodal large language models and
human annotations for question-answering pairs in existing MedVQA datasets,
i.e., VQA-RAD and SLAKE. Moreover, we design a novel framework which finetunes
lightweight pretrained generative models by incorporating medical
decision-making rationales into the training process. The framework includes
three distinct strategies to generate decision outcomes and corresponding
rationales, thereby clearly showcasing the medical decision-making process
during reasoning. Extensive experiments demonstrate that our method can achieve
an accuracy of 83.5
existing state-of-the-art baselines. Dataset and code will be released.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要