Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
CoRR(2022)
摘要
As Deep Neural Networks (DNNs) are increasingly deployed in safety critical
and privacy sensitive applications such as autonomous driving and biometric
authentication, it is critical to understand the fault-tolerance nature of
DNNs. Prior work primarily focuses on metrics such as Failures In Time (FIT)
rate and the Silent Data Corruption (SDC) rate, which quantify how often a
device fails. Instead, this paper focuses on quantifying the DNN accuracy given
that a transient error has occurred, which tells us how well a network behaves
when a transient error occurs. We call this metric Resiliency Accuracy (RA). We
show that existing RA formulation is fundamentally inaccurate, because it
incorrectly assumes that software variables (model weights/activations) have
equal faulty probability under hardware transient faults. We present an
algorithm that captures the faulty probabilities of DNN variables under
transient faults and, thus, provides correct RA estimations validated by
hardware. To accelerate RA estimation, we reformulate RA calculation as a Monte
Carlo integration problem, and solve it using importance sampling driven by DNN
specific heuristics. Using our lightweight RA estimation method, we show that
transient faults lead to far greater accuracy degradation than what todays DNN
resiliency tools estimate. We show how our RA estimation tool can help design
more resilient DNNs by integrating it with a Network Architecture Search
framework.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要