Graph Attention Networks Adjusted Bi-Lstm For Video Summarization

Rui Zhong,Rui Wang,Yang Zou, Zhiqiang Hong,Min Hu

IEEE SIGNAL PROCESSING LETTERS（2021）

引用 22|浏览31

暂无评分

摘要

The high redundancy among keyframes is a critical issue for the prior summarizing methods in dealing with user-created videos. To address the critical issue, we present a Graph Attention Networks (GAT) adjusted Bi-directional Long Short-term Memory (Bi-LSTM) model for unsupervised video summarization. First, the GAT is adopted to transform an image's visual features into higher-level features by the Contextual Features based Transformation (CFT) mechanism. Specifically, a novel Salient-Area-Size-based spatial attention model is presented to extract frame-wise visual features on the observation that humans tend to focus on sizable and moving objects. Second, the higher-level visual features are integrated with semantic features processed by Bi-LSTM to refine the frame-wise probability of being selected as keyframes. Extensive experiments demonstrate that our method outperforms state-of-the-art methods.

查看译文

关键词

Visualization, Feature extraction, Semantics, Transforms, Redundancy, Mathematical model, Histograms, Graph attention networks, Bi-LSTM, video summarization, unsupervised learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要