PiCor: Multi-Task Deep Reinforcement Learning with Policy Correction.

Fengshuo Bai,Hongming Zhang, Tianyang Tao, Zhiheng Wu, Yanna Wang,Bo Xu

AAAI(2023)

引用 3|浏览10
暂无评分
摘要
Multi-task deep reinforcement learning (DRL) ambitiously aims to train a general agent that masters multiple tasks simultaneously. However, varying learning speeds of different tasks compounding with negative gradient interference makes policy learning inefficient. In this work, we propose PiCor, an efficient multi-task DRL framework that splits learning into policy optimization and policy correction phases. The policy optimization phase improves the policy by any DRL algothrim on the sampled single task without considering other tasks. The policy correction phase first constructs a performance constraint set with adaptive weight adjusting. Then the intermediate policy learned by the first phase is constrained to the set, which controls the negative interference and balances the learning speeds across tasks. Empirically, we demonstrate that PiCor outperforms previous methods and significantly improves sample efficiency on simulated robotic manipulation and continuous control tasks. We additionally show that adaptive weight adjusting can further improve data efficiency and performance.
更多
查看译文
关键词
deep reinforcement learning,policy,multi-task
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要