Human attention based movie summarization: Dataset and baseline model.

2022 IEEE International Conference on Multimedia and Expo (ICME)(2023)

引用 0|浏览4
暂无评分
摘要
A movie summarization model can automatically edit a condensed version of a movie by selecting keyframes. Some previous works have proposed some movie summarizers based on traditional methods or recent neural networks and achieved some progress. Despite the demonstrated successes, there are some limitations: (1) previous works mainly resort to hand-crafted heuristics and most of them are unsupervised; (2) currently there is no publicly suitable dataset available for the supervised movie summarization; (3) existing works only focus on the movies themselves while neglecting the audiences, who have the most to say in which part of the movie is more attractive. To break through the aforementioned limitations, we establish a movie summarization dataset Movie50 and propose a novel human attention based annotation pipeline. Furthermore, we propose the A/V-MSNet, an audiovisual neural network that takes advantage of spatio-temporal visual and auditory information to better simulate human attention as well as exploit more plentiful information. The network is designed, trained end-to-end, and evaluated on the public dataset and our dataset. Extensive experiments demonstrate the superiority of the proposed method.
更多
查看译文
关键词
Movie summarization,Human attention,Multi-modal,Audiovisual,Keyframes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要