MINT: A wrapper to make multi-modal and multi-image AI models interactive
CoRR(2024)
Abstract
During the diagnostic process, doctors incorporate multimodal information
including imaging and the medical history - and similarly medical AI
development has increasingly become multimodal. In this paper we tackle a more
subtle challenge: doctors take a targeted medical history to obtain only the
most pertinent pieces of information; how do we enable AI to do the same? We
develop a wrapper method named MINT (Make your model INTeractive) that
automatically determines what pieces of information are most valuable at each
step, and ask for only the most useful information. We demonstrate the efficacy
of MINT wrapping a skin disease prediction model, where multiple images and a
set of optional answers to 25 standard metadata questions (i.e., structured
medical history) are used by a multi-modal deep network to provide a
differential diagnosis. We show that MINT can identify whether metadata inputs
are needed and if so, which question to ask next. We also demonstrate that when
collecting multiple images, MINT can identify if an additional image would be
beneficial, and if so, which type of image to capture. We showed that MINT
reduces the number of metadata and image inputs needed by 82
respectively, while maintaining predictive performance. Using real-world AI
dermatology system data, we show that needing fewer inputs can retain users
that may otherwise fail to complete the system submission and drop off without
a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step
decision making process of a clinical workflow and how this is different for
straight forward cases versus more difficult, ambiguous cases. Finally we
demonstrate how MINT is robust to different underlying multi-model classifiers
and can be easily adapted to user requirements without significant model
re-training.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined