谷歌浏览器插件
订阅小程序
在清言上使用

Regulation of reinforcement learning parameters captures long-term changes in rat behaviour

EUROPEAN JOURNAL OF NEUROSCIENCE(2024)

引用 0|浏览2
暂无评分
摘要
In uncertain environments in which resources fluctuate continuously, animals must permanently decide whether to stabilise learning and exploit what they currently believe to be their best option, or instead explore potential alternatives and learn fast from new observations. While such a trade-off has been extensively studied in pretrained animals facing non-stationary decision-making tasks, it is yet unknown how they progressively tune it while learning the task structure during pretraining. Here, we compared the ability of different computational models to account for long-term changes in the behaviour of 24 rats while they learned to choose a rewarded lever in a three-armed bandit task across 24 days of pretraining. We found that the day-by-day evolution of rat performance and win-shift tendency revealed a progressive stabilisation of the way they regulated reinforcement learning parameters. We successfully captured these behavioural adaptations using a meta-learning model in which either the learning rate or the inverse temperature was controlled by the average reward rate. In a three-armed bandit task conducted over several sessions, rats show improved performance and decreased exploration, which cannot be captured by a Q-learning model with static parameters. Meta-learning models in which the average reward rate regulates either the exploration-exploitation trade-off or the rate of learning captures these long-term changes. image
更多
查看译文
关键词
decision-making,dopamine,exploration-exploitation trade-off,meta-learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要