Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment
Shengyang Sun, Yian Zhang,Alexander Bukharin, David Mosallanezhad,Jiaqi Zeng, Soumye Singhal,Gerald Shen,Adithya Renduchintala,Tugrul Konuk, Yi Dong,Zhilin Wang, Dmitry Chichkov,Olivier Delalleau,Oleksii Kuchaiev CoRR(2025)
AI 理解论文
溯源树
样例
