Order from Chaos: Leveraging the Order in Time as Intrinsic Reward.

CSCS(2023)

Cited 0|Views4
No score
Abstract
Reinforcement learning (RL) agents often struggle with exploration problems where the environment is complex and the reward signal is sparse or delayed. In this paper, we propose a novel intrinsic reward mechanism that leverages the order of time to guide agents toward less chaotic policies. Our approach involves training a model to predict the correct order of a shuffled sequence of observations, which enables us to introduce the “orderability” score. This score captures the extent to which observations from a trajectory are uniquely ordered in time, and we hypothesize that it is a helpful metric for assessing the learning progress of reinforcement learning agents. By incorporating the orderability score as an intrinsic reward, we aim to encourage agents to explore their environment more effectively and achieve faster and more consistent reward maximization. In our experiments, we demonstrate that agents trained with the orderability intrinsic reward outperform baseline methods on challenging exploration tasks, highlighting the potential of our approach. By shedding light on the importance of time’s order in RL, we provide a fresh perspective on the challenge of exploration and pave the way for future research in this area.
More
Translated text
Key words
reinforcement learning,exploration,intrinsic reward,order of time
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined