Safe Advantage Actor-Critic Based on a Pool of Dangerous Samples

2024 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE)(2024)

引用 0|浏览6
暂无评分
摘要
Reinforcement learning requires agents to interact extensively with the environment, engaging in thorough exploration during the learning process. However, in practical applications, random exploration of the environment often entails significant risks. Taking dangerous actions could potentially lead to catastrophic consequences. Hence, reinforcing safety in reinforcement learning is an essential aspect for its practical implementation. This paper proposes enhancing the Advantage Actor Critic algorithm by introducing a deep Q-network, referred to as the punisher. The Punisher network learns from a pool of dangerous samples and provides a penalty factor to the Actor network. This allows the Actor network to learn while considering both reward and penalty factors. It penalizes actions that might lead to dangerous states, thereby strengthening the traditional Actor Critic framework with safety measures.Finally, within the Gym platform's Mountain Car game, this method demonstrates the ability to enhance safety during exploration while sacrificing a certain degree of learning efficiency.
更多
查看译文
关键词
Deep Q-Network,Advantage Actor-Critic,Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要