Attentive Task-Net - Self Supervised Task-Attention Network for Imitation Learning using Video Demonstration.

ICRA(2020)

Cited 8|Views29
No score
Abstract
This paper proposes an end-to-end self-supervised feature representation network named Attentive Task-Net or AT-Net for video-based task imitation. The proposed AT-Net incorporates a novel multi-level spatial attention module to highlight spatial features corresponding to the intended task demonstrated by the expert. The neural connections in AT-Net ensure the relevant information in the demonstration is amplified and the irrelevant information is suppressed while learning task-specific feature embeddings. This is achieved by a weighted combination of multiple intermediate feature maps of the input image at different stages of the CNN pipeline. The weights of the combination are given by the compatibility scores, predicted by the attention module for respective feature maps. The AT-Net is trained using a metric learning loss which aims to decrease the distance between the feature representations of concurrent frames from multiple view points and increase the distance between temporally consecutive frames. The AT-Net features are then used to formulate a reinforcement learning problem for task imitation. Through experiments on the publicly available Multi-view pouring dataset, it is demonstrated that the output of the attention module highlights the task-specific objects while suppressing the rest of the background. The efficacy of the proposed method is further validated by qualitative and quantitative comparison with a state-of-the-art technique along with intensive ablation studies. The proposed method is implemented to imitate a pouring task where an RL agent is learned with the AT-Net in Gazebo simulator. Our findings show that the AT-Net achieves 6.5% decrease in alignment error along with a reduction in the number of training iterations by almost 155k over the state-of-the-art while satisfactorily imitating the intended task.
More
Translated text
Key words
task-specific objects,intended task,imitation learning,video demonstration,end-to-end self-supervised feature representation network,video-based task imitation,multilevel spatial attention module,spatial features,weighted combination,multiple intermediate feature maps,respective feature maps,metric learning loss,multiple view points,AT-Net features,reinforcement learning problem,attentive task-net,self supervised task-attention network,neural connections,learning task-specific feature embeddings,temporally consecutive frames,publicly available multiview pouring dataset,RL agent,Gazebo simulator,CNN pipeline
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined