Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning
arxiv(2023)
摘要
Posterior sampling allows exploitation of prior knowledge on the
environment's transition dynamics to improve the sample efficiency of
reinforcement learning. The prior is typically specified as a class of
parametric distributions, the design of which can be cumbersome in practice,
often resulting in the choice of uninformative priors. In this work, we propose
a novel posterior sampling approach in which the prior is given as a (partial)
causal graph over the environment's variables. The latter is often more natural
to design, such as listing known causal dependencies between biometric features
in a medical treatment study. Specifically, we propose a hierarchical Bayesian
procedure, called C-PSRL, simultaneously learning the full causal graph at the
higher level and the parameters of the resulting factored dynamics at the lower
level. We provide an analysis of the Bayesian regret of C-PSRL that explicitly
connects the regret rate with the degree of prior knowledge. Our numerical
evaluation conducted in illustrative domains confirms that C-PSRL strongly
improves the efficiency of posterior sampling with an uninformative prior while
performing close to posterior sampling with the full causal graph.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要