Testing for Fault Diversity in Reinforcement Learning
arxiv(2024)
摘要
Reinforcement Learning is the premier technique to approach sequential
decision problems, including complex tasks such as driving cars and landing
spacecraft. Among the software validation and verification practices, testing
for functional fault detection is a convenient way to build trustworthiness in
the learned decision model. While recent works seek to maximise the number of
detected faults, none consider fault characterisation during the search for
more diversity. We argue that policy testing should not find as many failures
as possible (e.g., inputs that trigger similar car crashes) but rather aim at
revealing as informative and diverse faults as possible in the model. In this
paper, we explore the use of quality diversity optimisation to solve the
problem of fault diversity in policy testing. Quality diversity (QD)
optimisation is a type of evolutionary algorithm to solve hard combinatorial
optimisation problems where high-quality diverse solutions are sought. We
define and address the underlying challenges of adapting QD optimisation to the
test of action policies. Furthermore, we compare classical QD optimisers to
state-of-the-art frameworks dedicated to policy testing, both in terms of
search efficiency and fault diversity. We show that QD optimisation, while
being conceptually simple and generally applicable, finds effectively more
diverse faults in the decision model, and conclude that QD-based policy testing
is a promising approach.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要