TransNav: spatial sequential transformer network for visual navigation

JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING(2022)

Cited 1|Views4
No score
Abstract
Visual navigation task is to steer an embodied agent finding the given target based on observation. The effective transformer from observation of the agent to visual representation determines the navigation actions and promotes more informed navigation policy. In this work, we propose a spatial sequential transformer network (SSTNet) for learning informative visual representation in deep reinforcement learning. SSTNet is composed by spatial attention probability fused model (SAF) and sequential transformer network (STNet). SAF enforces cross-modal state into visual clues in reinforcement learning. It encodes semantic information about observed objects, as well as spatial information about their location, which jointly exploiting image inter-relations. STNet generates (imagines) the next observations and makes action inference of the aspects most relevant to the target. It decodes the image intra-relations. This way, the agent learns to understand the causality between navigation actions and dynamic changes in observations. SSTNet is conditioned on an auto-regressive model on the desired reward, past states, actions, and knowledge graph. The whole navigation framework considers the local and global visual information, as well as time sequential information. Thus, it allows the agent to navigate towards the sought-after object effectively. We evaluate our model on the AI2THOR framework show that our method attains at least 10% improvement of average success rate over most state-of-the-art models.
More
Translated text
Key words
visual navigation,knowledge graph,reinforcement learning,spatial attention,transformer network
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined