MultiViz: Towards User-Centric Visualizations and Interpretations of Multimodal Models

CHI Extended Abstracts(2023)

引用 0|浏览97
暂无评分
摘要
The nature of human and computer interactions are inherently multimodal, which has led to substantial interest in building interpretable, interactive, and reliable multimodal interfaces. However, modern multimodal models and interfaces are typically black-box neural networks, which makes it challenging to understand their internal mechanics. How can we visualize their internal workings in order to empower stakeholders to visualize model behavior, perform model debugging, and promote trust in these models? Our paper proposes MultiViz, a method for analyzing the behavior of multimodal models via 4 stages: (1) unimodal importance, (2) cross-modal interactions, (3) multimodal representations and (4) multimodal prediction. MultiViz includes modular visualization tools for each stage before combining outputs from all stages through an interactive and human-in-the-loop API. Through user studies with 21 participants on 8 trained models across 6 real-world tasks, we show that the complementary stages in MultiViz together enable users to (1) simulate model predictions, (2) assign interpretable concepts to features, (3) perform error analysis on model misclassifications, and (4) use insights from error analysis to debug models. MultiViz is publicly available at https://github.com/pliang279/MultiViz, will be regularly updated with new visualization tools and metrics, and welcomes input from the community1.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要