Chrome Extension
WeChat Mini Program
Use on ChatGLM

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

ICML 2024(2024)

Cited 0|Views22
No score
Abstract
This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning. It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm: (1) applying a ReLU function to advantage estimates, (2) spectral normalization of actor-critic weights, and (3) incorporatingdropout as a Bayesian approximation. We prove under standard assumptions that restricting policy updates to positive advantages optimizes for value by maximizing a lower bound on the value function plus an additive term. We show that the additive term is bounded proportional to the Lipschitz constant of the value function, which offers theoretical grounding for spectral normalization of critic weights. Finally, our application of dropout corresponds to approximate Bayesian inference over both the actor and critic parameters, which enablesadaptive state-awareexploration around the modes of the actor via Thompson sampling. We demonstrate significant improvements for median and interquartile mean metrics over A3C, PPO, SAC, and TD3 on the MuJoCo continuous control benchmark and improvement over PPO in the challenging ProcGen generalization benchmark.
More
Translated text
Key words
on-policy,actor-critic
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined