Phasic Diversity Optimization for Population-Based Reinforcement Learning

ICRA 2024(2024)

Cited 0|Views7
No score
Abstract
Looking back at previous diversity work on Rein-forced learning, diversity is often achieved through the augmented loss function, which is required in the context of reward and diversity. Usually, the diversity optimization algorithm uses the multi-armed bandit algorithm to select the coefficient that predefined space. However, the dynamic distribution of the reward signal or quality of the MAB with diversity limits the performance of these methods. We introduce the Phase Diversity Optimization (PDO) algorithm, a population-based training-based framework that combines reward and diversity training to different stages, rather than optimizing multi-objective functions. In the secondary phase, having poor performance diversification through determinants does not replace the better agents in the archive. Rewards and diversity allow us to use the diversity of positive optimization in the secondary phase, where performance does not degrade. Furthermore, we built an aerial melee scenario agent.
More
Translated text
Key words
Reinforcement Learning,Aerial Systems: Perception and Autonomy,Aerial Systems: Applications
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined