Lightweight hybrid model based on MobileNet-v2 and Vision Transformer for human-robot interaction

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE(2024)

Cited 0|Views3
No score
Abstract
Within convolutional neural networks, convolutional operations are good at extracting local features, but have difficulty in capturing global representations. For Vision Transformer, multi-head self-attention can capture feature dependencies over long distance, but can destruct local feature details. Based on this, we propose a novel lightweight model, named HybridNet, based on MobileNet-v2 and Vision Transformer, capable of combining the advantages of both CNNs and Vision Transformer. In addition, to enhance the capability of HybridNet for temporal information interaction, we incorporate temporal-channel attention in HybridNet. We conducted experiments on Kinetics-400, Jester, and EgoGesture datasets to validate the effectiveness of HybridNet. The experimental results demonstrate that the lightweight model HybridNet achieves 96.3% and 93.9% accuracy on Jester and EgoGesture, respectively, obtaining the performance close to or even comparable with the state-of-the-art methods. Last but not least, we take HybridNet as the real-time gesture recognition model and use the recognition results as commands to control robots in the simulation environment to achieve human-robot interaction. The use of gesture interaction between humans and robots improves communication, facilitates physical collaboration, enables non-verbal expression, enhances accessibility, and creates a more engaging user experience. It adds a dimension of intuitiveness and efficiency to human-robot interaction, making it more dynamic and interactive.
More
Translated text
Key words
2-dimensional convolutional neural network,Vision Transformer,Lightweight model,Gesture recognition,Human-robot interaction
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined