A Policy-Graph Approach to Explain Reinforcement Learning Agents: A Novel Policy-Graph Approach with Natural Language and Counterfactual Abstractions for Explaining Reinforcement Learning Agents

Research Square (Research Square)(2023)

引用 0|浏览1
暂无评分
摘要
Abstract As reinforcement learning (RL) continues to improve and be appliedin situations alongside humans, the need to explain the learned behaviorsof RL agents to end-users becomes more important. Strategies forexplaining the reasoning behind an agent’s policy, called policy-levelexplanations, can lead to important insights about both the task and theagent’s behaviors. Following this line of research, in this work, we proposea novel approach, named as CAPS, that summarizes an agent’s policy inthe form of a directed graph with natural language descriptions. A decisiontree based clustering method is utilized to abstract the state space ofthe task into fewer, condensed states which makes the policy graphs moredigestible to end-users. We then use the user-defined predicates to enrich the abstract states with semantic meaning. To introduce counterfactual state explanations to the policy graph, wefirst identify the critical states in the graph then develop a novel counterfactualexplanation method based on action perturbation in those criticalstates.We generate explanation graphs using CAPS on 5 RL tasksfor deterministic and stochastic policies. We evaluate the effectivenessof CAPS on human participants who are not RL experts in twouser studies. When provided with our explanation graph, end-users are ableto accurately interpret policies of trained RL agents 80% of the time and 68.2%of users demonstrated an increase in their confidence in understandingan agent’s behavior.
更多
查看译文
关键词
explain reinforcement learning agents,counterfactual abstractions,natural language,policy-graph
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要