SVGraph: Learning Semantic Graphs from Instructional Videos

2022 IEEE Eighth International Conference on Multimedia Big Data (BigMM)(2022)

引用 1|浏览17
暂无评分
摘要
In this work, we focus on generating graphical representations of instructional videos. We propose a self-supervised, interpretable approach that does not require any annotations for graphical representations, which would be expensive and time consuming to collect. We attempt to overcome ‘black box’ learning limitations by presenting Semantic Video Graph or SVGraph, a multi-modal approach that utilizes narrations for semantic interpretability of the learned graphs. SVGraph 1) relies on the agreement between multiple modalities to learn a unified graphical structure with the help of cross-modal attention, and 2) assigns semantic interpretation with the help of Semantic-Assignment, which captures the semantics from narration. We perform experiments on multiple datasets and demonstrate the interpretability of SVGraph in semantic graph learning.
更多
查看译文
关键词
multimodal learning,deep learning,interpretability,graph learning,video understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要