Application of Massive Parallel Computation Based Q-Learning in System Control

Demei Huang, Haoyuan Zhu, Xianqi Lin,Li Wang

2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)（2022）

引用 3|浏览0

暂无评分

摘要

The Q-Learning algorithm is a reinforcement learning algorithm based on continuous decision-making and satisfying the Markov decision process property (MDP), and its agent operating environment also has dynamic uncertainties. Due to the existence of such a framework, the Q-Learning algorithm can be widely used in complex real-world problems, such as intelligent robots, industrial control, games, etc. However, Q-Learning is also a very slow method, which requires a long training time and computing resources to obtain a practical convergence strategy, such as obtaining a path between any two points on a map of size 21*10. The converged policy takes about 1200 seconds. Importantly, this time increases exponentially with the size of the map. This in turn limits its application in practical problems. Super computer or cloud-based computation could be a solution to this problem. However, both are too expensive to small and medium scaled entities. Other alternatives could be multithreading and GPU. In contrast to the traditional single thread Q-learning algorithm which utilize only part of hardware capacity, they can fully utilize hardware. In addition, they are far more affordable for most entities. In addition, there are enough mature support materials for this hardware. This paper attempts to improve this problem by proposing a model that uses shared Q-tables, multithreading, and GPU based massively parallel computing. This solution can give full play to the performance of hardware devices and achieve the goal of increasing the algorithm convergence speed by dozens of times without increasing hardware costs. It effectively lowers the threshold for using reinforcement learning technology.

查看译文

关键词

q-learning,multi-threading,massive parallelism,GPU,CUDA,reinforced learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要