Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS(2024)

引用 0|浏览9
暂无评分
摘要
Hierarchical reinforcement learning (HRL) exhibits remarkable potential in addressing large-scale and long-horizon complex tasks. However, a fundamental challenge, which arises from the inherently entangled nature of hierarchical policies, has not been understood well, consequently compromising the training stability and exploration efficiency of HRL. In this article, we propose a novel HRL algorithm, high-level model approximation (HLMA), presenting both theoretical foundations and practical implementations. In HLMA, a Planner constructs an innovative high-level dynamic model to predict the $k$ -step transition of the Controller in a subtask. This allows for the estimation of the evolving performance of the Controller. At low level, we leverage the initial state of each subtask, transforming absolute states into relative deviations by a designed operator as Controller input. This approach facilitates the reuse of subtask domain knowledge, enhancing data efficiency. With this designed structure, we establish the local convergence of each component within HLMA and subsequently derive regret bounds to ensure global convergence. Abundant experiments conducted on complex locomotion and navigation tasks demonstrate that HLMA surpasses other state-of-the-art single-level RL and HRL algorithms in terms of sample efficiency and asymptotic performance. In addition, thorough ablation studies validate the effectiveness of each component of HLMA.
更多
查看译文
关键词
Task analysis,Heuristic algorithms,Convergence,Predictive models,Trajectory,Reinforcement learning,Navigation,Hierarchical reinforcement learning (HRL),model-based prediction,neural network approximation,regret bounds,robot locomotion and navigation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要