Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING(2023)

引用 0|浏览0
暂无评分
摘要
This paper presents a range of novel reinforcement learning algorithms derived from the actor-critic approach. These modified algorithms effectively utilize the available information to enhance performance. Our proposed framework introduces several key components to the traditional actor-critic model, including an underlying model learner, selector, tuner, and estimator. The estimator employs an approximate value function and the learned underlying model to estimate the values of all actions at the next state. The selector approximates the optimal action at the next state, which is then utilized by the actor to optimize its policy. In contrast to the conventional actor-critic algorithm where the actor focuses solely on policy optimization and the critic performs value-function approximation and policy evaluation, our selector-actor-critic algorithm employs a selector to approximate the optimal action at the current state, thereby influencing the actor's policy updates. Furthermore, our tuner-actor-critic algorithm incorporates a critic and a model-learner to approximate the action-value function and the dynamics of the underlying environment, respectively. The tuner then utilizes this information to adjust the value of the current state-action pair. In the estimator-selector-actor-critic algorithm, we develop intelligent agents based on the concepts of lookahead and intuition. Lookahead is utilized in estimating the values of available actions at the next state, while intuition guides the maximization of the probability of selecting the approximate optimal action. Through simulation experiments, we evaluate the performance of these algorithms, and the results demonstrate the superiority of the estimator-selector-actor-critic approach over other existing algorithms.
更多
查看译文
关键词
Artificial intelligence,On-policy learning,Off-policy learning,Actor–critic,Underlying model.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要