FRL-FI: Transient Fault Analysis for Federated Reinforcement Learning-Based Navigation Systems

Zishen Wan,Aqeel Anwar,Abdulrahman Mahmoud,Tianyu Jia,Yu-Shun Hsiao,Vijay Janapa Reddi,Arijit Raychowdhury

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)（2022）

Cited 5|Views48

No score

Abstract

Swarm intelligence is being increasingly deployed in autonomous systems, such as drones and unmanned vehicles. Federated reinforcement learning (FRL), a key swarm intelligence paradigm where agents interact with their own environments and cooperatively learn a consensus policy while preserving privacy, has recently shown potential advantages and gained popularity. However, transient faults are increasing in the hardware system with continuous technology node scaling and can pose threats to FRL systems. Meanwhile, conventional redundancy-based protection methods are challenging to deploy on resource-constrained edge applications. In this paper, we experimentally evaluate the fault tolerance of FRL navigation systems at various scales with respect to fault models, fault locations, learning algorithms, layer types, communication intervals, and data types at both training and inference stages. We further propose two cost-effective fault detection and recovery techniques that can achieve up to $3.3\times$ improvement in resilience with $<2.7\%$ overhead in FRL systems.

Translated text

Key words

fault tolerance,resource-constrained edge applications,continuous technology node scaling,hardware system,transient faults,consensus policy,key swarm intelligence paradigm,unmanned vehicles,autonomous systems,federated reinforcement learning-based navigation systems,transient fault analysis,FRL-FI,recovery techniques,cost-effective fault detection,fault locations,fault models,FRL navigation systems

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined