Recurrent Network Models for Human Dynamics

2015 IEEE International Conference on Computer Vision (ICCV)(2015)

引用 1044|浏览302
暂无评分
摘要
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoid drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units.
更多
查看译文
关键词
recurrent network models,human dynamics,encoder-recurrent-decoder model,ERD model,human body pose prediction,human body pose recognition,recurrent neural network,nonlinear encoder-decoder networks,ERD architectures,motion capture generation,mocap generation,body pose labeling,body pose forecasting,mocap training data handling,motion synthesis,human pose labeling,body part detector,video pose forecasting,temporal horizon,first order motion model,optical flow,long short term memory models,LSTM models,representation learning,space-time labeling,spatiotemporal visual domain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要