Chrome Extension
WeChat Mini Program
Use on ChatGLM

Learning in Stackelberg Games with Non-myopic Agents

Economics and Computation(2022)

Cited 1|Views18
No score
Abstract
BSTRACTStackelberg games are a canonical model for strategic principal-agent interactions. Consider, for instance, a defense system that distributes its security resources across high-risk targets prior to attacks being executed; or a tax policymaker who sets rules on when audits are triggered prior to seeing filed tax reports; or a seller who chooses a price prior to knowing a customer's proclivity to buy. In each of these scenarios, a principal first selects an action x∈X and then an agent reacts with an action y∈Y, where X and Y are the principal's and agent's action spaces, respectively. In the examples above, agent actions correspond to which target to attack, how much tax to pay to evade an audit, and how much to purchase, respectively. Typically, the principal wants an x that maximizes their payoff when the agent plays a best response y = br(x); such a pair (x, y) is a Stackelberg equilibrium. By committing to a strategy, the principal can guarantee they achieve a higher payoff than in the fixed point equilibrium of the corresponding simultaneous-play game. However, finding such a strategy requires knowledge of the agent's payoff function. When faced with unknown agent payoffs, the principal can attempt to learn a best response via repeated interactions with the agent. If a (naïve) agent is unaware that such learning occurs and always plays a best response, the principal can use classical online learning approaches to optimize their own payoff in the stage game. Learning from myopic agents has been extensively studied in multiple Stackelberg games, including security games[2,6,7], demand learning[1,5], and strategic classification[3,4]. However, long-lived agents will generally not volunteer information that can be used against them in the future. This is especially the case in online environments where a learner seeks to exploit recently learned patterns of behavior as soon as possible, and the agent can see a tangible advantage for deviating from its instantaneous best response and leading the learner astray. This trade-off between the (statistical) efficiency of learning algorithms and the perverse incentives they may create over the long-term brings us to the main questions of this work: What are principled approaches to learning against non-myopic agents in general Stackelberg games? How can insights from learning against myopic agents be applied to learning in the non-myopic case?
More
Translated text
Key words
stackelberg games,agents,learning,non-myopic
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined