Conformal Off-Policy Prediction for Multi-Agent Systems
arxiv(2024)
摘要
Off-Policy Prediction (OPP), i.e., predicting the outcomes of a target policy
using only data collected under a nominal (behavioural) policy, is a paramount
problem in data-driven analysis of safety-critical systems where the deployment
of a new policy may be unsafe. To achieve dependable off-policy predictions,
recent work on Conformal Off-Policy Prediction (COPP) leverage the conformal
prediction framework to derive prediction regions with probabilistic guarantees
under the target process. Existing COPP methods can account for the
distribution shifts induced by policy switching, but are limited to
single-agent systems and scalar outcomes (e.g., rewards). In this work, we
introduce MA-COPP, the first conformal prediction method to solve OPP problems
involving multi-agent systems, deriving joint prediction regions for all
agents' trajectories when one or more "ego" agents change their policies.
Unlike the single-agent scenario, this setting introduces higher complexity as
the distribution shifts affect predictions for all agents, not just the ego
agents, and the prediction task involves full multi-dimensional trajectories,
not just reward values. A key contribution of MA-COPP is to avoid enumeration
or exhaustive search of the output space of agent trajectories, which is
instead required by existing COPP methods to construct the prediction region.
We achieve this by showing that an over-approximation of the true JPR can be
constructed, without enumeration, from the maximum density ratio of the JPR
trajectories. We evaluate the effectiveness of MA-COPP in multi-agent systems
from the PettingZoo library and the F1TENTH autonomous racing environment,
achieving nominal coverage in higher dimensions and various shift settings.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要