Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions

International Journal of Intelligent Information and Database Systems(2007)

引用 26|浏览4
暂无评分
摘要
In reinforcement learning, as in many on-line search techniques, a large number of estimation parameters (e.g. Q-value estimates for 1-step Q-learning) are maintained and dynamically up- dated as information comes to hand during the learning process. Excessive variance of these es- timators can be problematic, resulting in un- even or unstable learning, or even making ef- fective learning impossible. Estimator variance is usually managed only indirectly, by selecting global learning algorithm parameters (e.g. A for TD(A) based methods) that axe a compromise between an acceptable level of estimator per- turbation and other desirable system attributes, such as reduced estimator bias. In this paper, we argue that this approach may not always be ade- quate, particularly for noisy and non-Markovian domains, and present a direct approach to man- aging estimator variance, the new ccBeta al- gorithm. Empirical results in an autonomous robotics domain are also presented showing im- proved performance using the ccBeta method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要