Deep Reinforcement Learning with Multi-Critic TD3 for Decentralized Multi-Robot Path Planning

IEEE Transactions on Cognitive and Developmental Systems(2024)

引用 0|浏览12
暂无评分
摘要
Centralized multi-robot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multi-robot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multi-critic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original GRU-Attention based TD3 algorithm by incorporating a multi-critic network and employing an asynchronous training mechanism.

By training each critic with a unique reward function, our learned policy enables each robot to navigate towards its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots.

Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.

更多
查看译文
关键词
Multi-robot path planning,collision avoidance,deep reinforcement learning,multi-robot systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要