AI safety by debate via regret minimization
CoRR(2023)
摘要
We consider the setting of AI safety by debate as a repeated game. We
consider the question of efficient regret minimization in this setting, when
the players are either AIs or humans, equipped with access to computationally
superior AIs. In such a setting, we characterize when internal and external
regret can be minimized efficiently. We conclude with conditions in which a
sequence of strategies converges to a correlated equilibrium.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要