Static hand gesture recognition method based on the Vision Transformer

Multimedia Tools and Applications(2023)

Cited 0|Views3
No score
Abstract
Hand gesture recognition (HGR) is the most important part of human-computer interaction (HCI). Static hand gesture recognition is equivalent to the classification of hand gesture images. At present, the classification of hand gesture images mainly uses the Convolutional Neural Network (CNN) method. The Vision Transformer architecture (ViT) proposes not to use the convolutional layers at all but to use the multi-head attention mechanism to learn global information. Therefore, this paper proposes a static hand gesture recognition method based on the Vision Transformer. This paper uses a self-made dataset and two publicly available American Sign Language (ASL) datasets to train and evaluate the ViT architecture. Using the depth information provided by the Microsoft Kinect camera to capture the hand gesture images and filter the background, then use the eight-connected discrimination algorithm and the distance transformation algorithm to remove the redundant arm information. The resulting images constitute a self-made dataset. At the same time, this paper studies the impact of several data augmentation strategies on recognition performance. This paper uses accuracy, F1 score, recall, and precision as evaluation metrics. Finally, the validation accuracy of the proposed model on the three datasets achieves 99.44%, 99.37%, and 96.53%, respectively, and the results obtained are better than those obtained by some CNN structures.
More
Translated text
Key words
Hand gesture recognition,Vision Transformer,Arm removal,Data augmentation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined