Robust spatial-temporal deep model for multimedia event detection.

Neurocomputing(2016)

引用 9|浏览68
暂无评分
摘要
The task of multimedia event detection (MED) aims at training a set of models that can automatically detect the most event-relevant videos from large datasets. In this paper, we attempt to build a robust spatial-temporal deep neural network for large-scale video event detection. In our setting, each video follows a multiple instance assumption, where its visual segments contain both spatial and temporal properties of events. Regarding these properties, we try to implement the MED system by a two-step training phase: unsupervised recurrent video reconstruction and supervised fine-tuning. We conduct extensive experiments on the challenging TRECVID MED14 dataset, which indicate that with the consideration of both spatial and temporal information, the detection performance can be further boosted compared with the state-of-the-art MED models.
更多
查看译文
关键词
Multimedia event detection,Deep learning,GRU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要