Training Language Models to Self-Correct Via Reinforcement Learning
Aviral Kumar,Vincent Zhuang,Rishabh Agarwal,Yi Su,John Co-Reyes,Avi Singh,Kate Baumli,Shariq Iqbal,Colton Bishop,Rebecca Roelofs,Lei Zhang,Kay McKinney,Disha Shrivastava,Cosmin Paduraru,George Tucker,Doina Precup,Feryal Behbahani,Aleksandra Faust ICLR 2025(2025)
AI 理解论文
溯源树
样例
