Persuading a Learning Agent
CoRR(2024)
Abstract
We study a repeated Bayesian persuasion problem (and more generally, any
generalized principal-agent problem with complete information) where the
principal does not have commitment power and the agent uses algorithms to learn
to respond to the principal's signals. We reduce this problem to a one-shot
generalized principal-agent problem with an approximately-best-responding
agent. This reduction allows us to show that: if the agent uses contextual
no-regret learning algorithms, then the principal can guarantee a utility that
is arbitrarily close to the principal's optimal utility in the classic
non-learning model with commitment; if the agent uses contextual no-swap-regret
learning algorithms, then the principal cannot obtain any utility significantly
more than the optimal utility in the non-learning model with commitment. The
difference between the principal's obtainable utility in the learning model and
the non-learning model is bounded by the agent's regret (swap-regret). If the
agent uses mean-based learning algorithms (which can be no-regret but not
no-swap-regret), then the principal can do significantly better than the
non-learning model. These conclusions hold not only for Bayesian persuasion,
but also for any generalized principal-agent problem with complete information,
including Stackelberg games and contract design.
MoreTranslated text
Key words
behavioral agent,learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined