Poster Abstract: Learning from Demonstrations with Temporal Logics

Aniruddh Puranic,Jyotirmoy Deshmukh,Stefanos Nikolaidis

Cyber-physical Systems（2022）

引用 1|浏览19

暂无评分

摘要

ABSTRACT Learning-from-demonstrations (LfD) is a popular paradigm to obtain effective robot control policies for complex tasks via reinforcement learning without the need to explicitly design reward functions. However, it is susceptible to imperfections in demonstrations and also raises concerns of safety and interpretability in the learned control policies. To address these issues, we propose to use Signal Temporal Logic (STL) to express high-level robotic tasks and use its quantitative semantics to evaluate and rank the quality of demonstrations. Temporal logic-based specifications allow us to create non-Markovian rewards, and are also capable of defining interesting causal dependencies between tasks such as sequential task specifications. We present our completed work that proposed LfD-STL framework that learns from even suboptimal/imperfect demonstrations and STL specifications to infer rewards for reinforcement learning tasks. We have validated our approach through various experimental setups to show how our method outperforms prior LfD methods. We then discuss future directions for tackling the problem of explainability and interpretability in such learning-based systems.

查看译文

关键词

reward inference, temporal logic, demonstrations, imitation, reinforcement learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要