Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling

IEEE Transactions on Multimedia(2024)

引用 0|浏览6
暂无评分
摘要
The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This paper proposes an E fficient U nsupervised V ideo H ashing ( EUVH ) method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.
更多
查看译文
关键词
Video hashing,Deep Neural Network,Data Structure,Large-scale retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要