Learning Spatio-temporal Representation by Channel Aliasing Video Perception

International Multimedia Conference(2021)

引用 2|浏览7
暂无评分
摘要
ABSTRACTIn this paper, we propose a novel pretext task namely Channel Aliasing Video Perception (CAVP) for self-supervised video representation learning. The main idea of our approach is to generate channel aliasing videos, which carry different motion cues simultaneously by assembling distinct channels from different videos. With the generated channel aliasing videos, we propose to recognize the number of different motion flows within a channel aliasing video for perception of discriminative motion cues. As a plug-and-play method, the proposed pretext task can be integrated into a co-training framework with other self-supervised learning methods to further improve the performance. Experimental results on publicly available action recognition benchmarks verify the effectiveness of our method for spatio-temporal representation learning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要