RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu,Wei Xiong,Jie Ren,Lichang Chen,Junru Wu,Rishabh Joshi, Yang Gao,Jiaming Shen,Zhen Qin,Tianhe Yu, Daniel Sohn, Anastasia Makarova,Jeremiah Zhe Liu, Yuan Liu,Bilal Piot,Abe Ittycheriah,Aviral Kumar,Mohammad Saleh ICLR 2025(2025)
AI 理解论文
溯源树
样例
