On Using Hamiltonian Monte Carlo Sampling for RL

CDC(2022)

引用 0|浏览13
暂无评分
摘要
Q-Learning and other value function based reinforcement learning (RL) algorithms learn optimal policies from datasets of actions, rewards, and state transitions. However, generating independent and identically distributed (IID) data samples poses a significant challenge when the state transition dynamics are stochastic and high-dimensional; this is due to intractability of the associated normalizing integral. We address this challenge with Hamiltonian Monte Carlo (HMC) sampling since it offers a computationally tractable way to generate data for training RL algorithms in stochastic and high-dimensional contexts. We introduce Hamiltonian Q-Learning and use it to demonstrate, theoretically and empirically, that Q values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Hamiltonian Q-Learning also exploits underlying low-rank structure of the Q function using a matrix completion algorithm for reconstructing the Q function from Q value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply Q-Learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.
更多
查看译文
关键词
associated normalizing integral,Hamiltonian Monte Carlo sampling,Hamiltonian Q-Learning,high-dimensional contexts,independent and identically distributed data samples,matrix completion algorithm,optimal policies,Q function,Q value updates,state transition dynamics,state-action pairs,stochastic contexts,training RL algorithms,value function based reinforcement learning algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要