谷歌浏览器插件
订阅小程序
在清言上使用

Safe Policy Learning with Constrained Return Variance.

ADVANCES IN ARTIFICIAL INTELLIGENCE(2019)

引用 0|浏览3
暂无评分
摘要
It is desirable for a safety-critical application that the agent performs in a reliable and repeatable manner which conventional setting in reinforcement learning (RL) often fails to provide. In this work, we derive a novel algorithm to learn a safe hierarchical policy by constraining the direct estimate of the variance in the return in the Option- Critic framework [1]. We first present the novel theorem of safe control in the policy gradient methods and then extend the derivation to the Option-Critic framework.
更多
查看译文
关键词
Safety,Policy gradient,Option-Critic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要