Semantic representation and attention alignment for Graph Information Bottleneck in video summarization.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society(2023)

引用 1|浏览18
暂无评分
摘要
End-to-end Long Short-Term Memory (LSTM) has been successfully applied to video summarization. However, the weakness of the LSTM model, poor generalization with inefficient representation learning for inputted nodes, limits its capability to efficiently carry out node classification within user-created videos. Given the power of Graph Neural Networks (GNNs) in representation learning, we adopted the Graph Information Bottle (GIB) to develop a Contextual Feature Transformation (CFT) mechanism that refines the temporal dual-feature, yielding a semantic representation with attention alignment. Furthermore, a novel Salient-Area-Size-based spatial attention model is presented to extract frame-wise visual features based on the observation that humans tend to focus on sizable and moving objects. Lastly, semantic representation is embedded within attention alignment under the end-to-end LSTM framework to differentiate indistinguishable images. Extensive experiments demonstrate that the proposed method outperforms State-Of-The-Art (SOTA) methods.
更多
查看译文
关键词
Graph information bottleneck, contextual feature transformation (CFT), spatial attention model, video summarization, Bi-LSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要