Recognizing American Sign Language Gestures From Within Continuous Videos.

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(2018)

引用 87|浏览48
暂无评分
摘要
In this paper, we propose a novel hybrid model, 3D recurrent convolutional neural networks (3DRCNN), to recognize American Sign Language (ASL) gestures and localize their temporal boundaries within continuous videos, by fusing multi-modality features. Our proposed 3DRCNN model integrates 3D convolutional neural network (3DCNN) and enhanced fully connected recurrent neural network (FC-RNN), where 3DCNN learns multi-modality features from RGB, motion, and depth channels, and FC-RNN captures the temporal information among short video clips divided from the original video. Consecutive clips with the same semantic meaning are singled out by applying the sliding window approach to segment the clips on the entire video sequence. To evaluate our method, we collected a new ASL dataset which contains two types of videos: Sequence videos (in which a human performs a list of specific ASL words) and Sentence videos (in which a human performs ASL sentences, containing multiple ASL words). The dataset is fully annotated for each semantic region (i.e. the time duration of each word that the human signer performs) and contains multiple input channels. Our proposed method achieves 69.2% accuracy on the Sequence videos for 27 ASL words, which demonstrates its effectiveness of detecting ASL gestures from continuous videos.
更多
查看译文
关键词
American Sign Language gestures,continuous videos,hybrid model,3D recurrent convolutional neural networks,temporal boundaries,3DRCNN model,3D convolutional neural network,enhanced fully connected recurrent neural network,FC-RNN captures,temporal information,short video clips,consecutive clips,ASL dataset,Sequence videos,multiple ASL words,ASL gestures,video sequence,multimodality features fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要