Exploring Inter-Feature And Inter-Class Relationships With Deep Neural Networks For Video Classification

MM(2014)

引用 169|浏览201
暂无评分
摘要
Videos contain very rich semantics and are intrinsically multimodal. In this paper, we study the challenging task of classifying videos according to their high-level semantics such as human actions or complex events. Although extensive efforts have been paid to study this problem, most existing works combined multiple features using simple fusion strategies and neglected the exploration of inter-class semantic relationships. In this paper, we propose a novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance. Specifically, these two types of relationships are learned and utilized by rigorously imposing regularizations in a deep neural network (DNN). Such a regularized DNN can be efficiently launched using a GPU implementation with an affordable training cost. Through arming the DNN with better capability of exploring both the interfeature and the inter-class relationships, the proposed regularized DNN is more suitable for identifying video semantics. With extensive experimental evaluations, we demonstrate that the proposed framework exhibits superior performance over several state-of-the-art approaches. On the well-known Hollywood2 and Columbia Consumer Video benchmarks, we obtain to-date the best reported results: 65.7% and 70.6% respectively in terms of mean average precision.
更多
查看译文
关键词
Multimodal Features,Class Relationships,Deep Neural Networks,Action and Event Recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要