Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
arxiv(2024)
Abstract
The Value Iteration Network (VIN) is an end-to-end differentiable
architecture that performs value iteration on a latent MDP for planning in
reinforcement learning (RL). However, VINs struggle to scale to long-term and
large-scale planning tasks, such as navigating a 100× 100 maze – a task
which typically requires thousands of planning steps to solve. We observe that
this deficiency is due to two issues: the representation capacity of the latent
MDP and the planning module's depth. We address these by augmenting the latent
MDP with a dynamic transition kernel, dramatically improving its
representational capacity, and, to mitigate the vanishing gradient problem,
introducing an "adaptive highway loss" that constructs skip connections to
improve gradient flow. We evaluate our method on both 2D maze navigation
environments and the ViZDoom 3D navigation benchmark. We find that our new
method, named Dynamic Transition VIN (DT-VIN), easily scales to 5000 layers and
casually solves challenging versions of the above tasks. Altogether, we believe
that DT-VIN represents a concrete step forward in performing long-term
large-scale planning in RL environments.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined