谷歌Chrome浏览器插件
订阅小程序
在清言上使用

SPTNET: Span-based Prompt Tuning for Video Grounding

ICME(2023)

引用 0|浏览23
暂无评分
摘要
When a Pre-trained Language Model (PLM) is adopted in video grounding task, it usually acts as a text encoder without having its knowledge fully utilized. Also, there exists an inconsistency problem between the pre-training and downstream objectives. To solve the issues, we propose a new paradigm, named Span-based Prompt Tuning (SPTNet). It can convert the video grounding task into a cloze form. Specifically, a query is first changed into a form with mask token by a template, then the video and the query embeddings are integrated through a cross-modal transformer. The start and end points of the query matching time span are predicted with the embedding of the mask token. Experimental results on two public benchmarks ActivityNet Captions and Charades-STA show that our SPTNet achieves surpassing performance compared with state-of-the-art methods.
更多
查看译文
关键词
Video grounding,prompt learning,clustering,contrastive learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要