In the ZONE: Measuring difficulty and progression in curriculum generation

ICLR 2023(2023)

引用 0|浏览67
暂无评分
摘要
A common strategy in curriculum generation for reinforcement learning is to train a teacher network to generate tasks that fall within a student network's ``zone of proximal development'' (ZPD). These are tasks that are not too easy and not too hard for the student. Albeit intuitive, ZPD is not well understood computationally. We propose ZONE, a novel computational framework that operationalizes ZPD. It formalizes ZPD through the language of Bayesian probability theory, revealing that tasks should be selected by difficulty (the student's success probability on the task) and learning progression (the degree of change in the student's model parameters). ZONE operationalizes ZPD with two techniques that we apply on top of existing algorithms. One is REJECT, which rejects tasks outside a difficulty scope and the other is GRAD, which prioritizes tasks that maximize the student's gradient norm. Compared to the original algorithms, the ZONE techniques improve the student’s generalization performance on discrete Minigrid environments and continuous control Mujoco domains with up to $9 \times$ higher success. ZONE also accelerates the student's learning by training on up to $10\times$ less data.
更多
查看译文
关键词
curriculum learning,multiagent,Bayesian
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要