MIPA-ResGCN: a multi-input part attention enhanced residual graph convolutional framework for sign language recognition

COMPUTERS & ELECTRICAL ENGINEERING(2023)

引用 0|浏览3
暂无评分
摘要
Sign language (SL) is used as primary mode of communication by individuals who experience deafness and speech disorders. However, SL creates an inordinate communication barrier as most people are not acquainted with it. To solve this problem, many technological solutions using wearable devices, video, and depth cameras have been put forth. The ubiquitous nature of cameras in contemporary devices has resulted in the emergence of sign language recognition (SLR) using video sequence as a viable and unobtrusive substitute. Nonetheless, the utilization of SLR methods based on visual features, commonly known as appearance-based methods, presents notable computational complexities. In response to these challenges, this study introduces an accurate and computationally efficient pose-based approach for SLR. Our proposed approach comprises three key stages: pose extraction, handcrafted feature generation, and feature space mapping and recognition. Initially, an efficient off-the-shelf pose extraction algorithm is employed to extract pose information of various body parts of a subject captured in a video. Then, a multi-input stream has been generated using handcrafted features, i.e., joints, bone lengths, and bone angles. Finally, an efficient and lightweight residual graph convolutional network (ResGCN) along with a novel part attention mechanism, is proposed to encode body's spatial and temporal information in a compact feature space and recognize the signs performed. In addition to enabling effective learning during model training and offering cutting-edge accuracy, the proposed model significantly reduces computational complexity. Our proposed method is assessed on five challenging SL datasets, WLASL-100, WLASL-300, WLASL-1000, LSA-64, and MINDS-Libras, achieving state-of-the-art (SOTA) accuracies of 83.33 %, 72.90 %, 64.92 %, 100 +/- 0 %, and 96.70 +/- 1.07 %, respectively. Compared to previous approaches, we achieve superior performance while incurring a lower computational cost.
更多
查看译文
关键词
Sign language recognition,Pose sequence modeling,ResGCN,Part attention,Multi input architecture,Visualization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要