A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous Platforms

IEEE Transactions on Parallel and Distributed Systems(2023)

引用 1|浏览7
暂无评分
摘要
Despite the recent success of Deep Reinforcement Learning (DRL) in self-driving cars, robotics and surveillance, training DRL agents takes tremendous amount of time and computation resources. In this article, we aim to accelerate DRL with Prioritized Replay Buffer due to its state-of-the-art performance on various benchmarks. The computation primitives of DRL with Prioritized Replay Buffer include environment emulation, neural network inference, sampling from Prioritized Replay Buffer, updating Prioritized Replay Buffer and neural network training. The speed of running these primitives varies for various DRL algorithms such as Deep Q Network and Deep Deterministic Policy Gradient. This makes a fixed mapping of DRL algorithms inefficient. In this work, we propose a framework for mapping DRL algorithms onto heterogeneous platforms consisting of a multi-core CPU, a GPU and a FPGA. First, we develop specific accelerators for each primitive on CPU, FPGA and GPU. Second, we relax the data dependency between priority update and sampling performed in the Prioritized Replay Buffer. By doing so, the latency caused by data transfer between GPU, FPGA and CPU can be completely hidden without sacrificing the rewards achieved by agents learned using the target DRL algorithms. Finally, given a DRL algorithm specification, our design space exploration automatically chooses the optimal mapping of various primitives based on an analytical performance model. On widely used benchmark environments, our experimental results demonstrate up to 997.3x improvement in training throughput compared with baseline mappings on the same heterogeneous platform. Compared with the state-of-the-art distributed Reinforcement Learning framework RLlib, we achieve 1.06x similar to 1005x improvement in training throughput.
更多
查看译文
关键词
Deep reinforcement learning,design space exploration,FPGA,GPU,heterogeneous platform,prioritized replay buffer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要