ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies
CoRR(2024)
摘要
Understanding how failure occurs and how it can be prevented in reinforcement
learning (RL) is necessary to enable debugging, maintain user trust, and
develop personalized policies. Counterfactual reasoning has often been used to
assign blame and understand failure by searching for the closest possible world
in which the failure is avoided. However, current counterfactual state
explanations in RL can only explain an outcome using just the current state
features and offer no actionable recourse on how a negative outcome could have
been prevented. In this work, we propose ACTER (Actionable Counterfactual
Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for
generating counterfactual sequences that provides actionable advice on how
failure can be avoided. ACTER investigates actions leading to a failure and
uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of
actions that prevent it with minimal changes and high certainty even in
stochastic environments. Additionally, ACTER generates a set of multiple
diverse counterfactual sequences that enable users to correct failure in the
way that best fits their preferences. We also introduce three diversity metrics
that can be used for evaluating the diversity of counterfactual sequences. We
evaluate ACTER in two RL environments, with both discrete and continuous
actions, and show that it can generate actionable and diverse counterfactual
sequences. We conduct a user study to explore how explanations generated by
ACTER help users identify and correct failure.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要