SaGCN: Semantic-Aware Graph Calibration Network for Temporal Sentence Grounding

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 1|浏览18
暂无评分
摘要
Temporal sentence grounding is a challenging task that aims to localize the semantic corresponding segment from the untrimmed video according to the given query language description. Existing methods either utilize a cross-modal matching architecture following a scan-and-rank pipeline or directly predict the probabilities of being the target boundary for each frame based on the entire video content. However, such methods are weak when some of the critical semantic concepts in the query are actually relevant to multiple video segments or the desired video segment contains a query-irrelevant scene due to ignoring query semantic concepts and local and global cross-modal context. In this paper, we propose a novel semantic-aware graph calibration network (SaGCN) to address the issues mentioned above. Specifically, we first introduce a semantic-aware local relational graph module to capture the inherent relationships among the specific semantic concept relevant local contextual information for fine-grained cross-modal information interactions. Then, a semantic-aware global relational graph module is derived for global contextual information integration and achieving cross-modal alignment. Finally, an attention-based calibration module is designed for eliminating the irrelevant information maintained in the visual modality under the guidance of query description. Extensive experiments verify the effectiveness of our proposed SaGCN on two widely used datasets (Charades-STA and TACoS), in which we achieve significant and consistent improvement compared to the state-of-the-art approaches.
更多
查看译文
关键词
Semantics,Grounding,Task analysis,Database languages,Calibration,TV,Visualization,Temporal sentence grounding,semantic-aware calibration,relational graph,cross modal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要