Detecting human action as the spatiooral tube of maximum mutual information

Taiqing Wang,Shengjin Wang,Xiaoqing Ding

IEEE Transactions on Circuits and Systems for Video Technology（2014）

Cited 26|Views47

No score

Abstract

Human action detection in complex scenes is a challenging problem due to its high-dimensional search space and dynamic backgrounds. To achieve efficient and accurate action detection, we represent a video sequence as a collection of feature trajectories and model human action as the spatiooral tube (ST-tube) of maximum mutual information. First, a random forest is built to evaluate the mutual information of feature trajectories toward the action class, and then a one-order Markov model is introduced to recursively infer the action regions at consecutive frames. By exploring the time-continuity property of feature trajectories, the action region is efficiently inferred at large temporal intervals. Finally, we obtain an ST-tube by concatenating the consecutive action regions bounding the human bodies. Compared with the popular spatiooral cuboid action model, the proposed ST-tube model is not only more efficient, but also more accurate in action localization. Experimental results on the KTH, CMU and UCF sports datasets validate the superiority of our approach over the state-of-the-art methods in both localization accuracy and time efficiency. ? 2014 IEEE.

Translated text

Key words

Action detection,feature trajectory,mutual information,spatiooral cuboid (ST-cuboid),spatiooral tube (ST-tube)

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined