Tradeoff Between Performance and Reliability in FPGA Accelerated DNNs for Space Applications

2023 European Data Handling & Data Processing Conference (EDHPC)(2023)

Cited 0|Views4
No score
Abstract
The space industry is interested in using Commercial-off-the-self (COTS) SRAM Field Programmable Gate Arrays (FPGAs) for Deep Neural Networks (DNN) inference acceleration due to their low non-recurring engineering costs, flexibility and performance per watt ratio. Specifically, FPGA DNN accelerators can be adopted in several AI-empowered onboard applications, such as autonomous failure detection, cloud detection, and other real-time data processing. However, SRAM FPGAs are sensitive to radiation effects, e.g., to Single Event Upsets (SEUs), causing significant reliability challenges. In this work, we examine the tradeoff between the performance and reliability of an FPGA-based dataflow Quantised DNN (QNN) accelerator by leveraging its hardware folding parameter. Hardware folding determines the level of computing resource sharing (or the level of parallelisation) between the neurons of each network layer. A low folding factor results in a highly parallelised (i.e., low-latency) QNN, while a high folding factor leads to a less parallelised QNN. Based on the AMD-Xilinx FINN, an open-source dataflow compiler for developing QNNs on FPGAs, we generated three versions of a QNN accelerator that performs image classification. The three QNN design versions (Max, Med, Min) were produced with different folding factors, i.e., maximum (Max), medium (Med) and minimum (Min) folding, respectively, and were implemented on a Zynq-7020 system-on-chip (SoC) FPGA. We performed statistical fault injection experiments in the FPGA’s configuration memory to estimate the SEU vulnerability of the three QNNs. In turn, we calculated the Mean Time Between Failure (MTBF) and the Mean Executions Between Failure (MEBF) metrics for a Low Earth Orbit (LEO) mission to examine how hardware folding affects the balance between performance and reliability. Our results showed that the Med QNN is the most reliable design (i.e., has the highest MTBF) and the Min is the least reliable one (i.e., has the lowest MTBF), but the latter achieves the best compromise between performance and reliability (i.e., archives the highest MEBF).
More
Translated text
Key words
Deep Neural Network,Neural Network,Image Classification,Parallelization,Low Earth Orbit,Injection Experiments,Reliable Design,Image Classification Performance,Real-time Data Processing,Mean Time To Failure,Maximum Fold,Convolutional Neural Network,Artificial Neural Network,Reliability Analysis,Resource Utilization,Space Exploration,Fault-tolerant,Lookup Table,Dependability,Random Bits,System Crashes,Irradiation Experiments,Flip Flop,Bitwise Operations,Mission Requirements,Mitigation Techniques,Image Categories,Quantization Levels
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined