Hardware-Software Co-Design for Real-Time Latency-Accuracy Navigation in Tiny Machine Learning Applications

Payman Behnam,Jianming Tong,Alind Khare, Yangyu Chen,Yue Pan, Pranav Gadikar,Abhimanyu Bambhaniya,Tushar Krishna,Alexey Tumanov

IEEE MICRO（2023）

引用 0|浏览3

暂无评分

摘要

Tiny machine learning (TinyML) applications increasingly operate in dynamically changing deployment scenarios, requiring optimization for both accuracy and latency. Existing methods mainly target a single point in the accuracy/latency tradeoff space, which is insufficient as no single static point can be optimal under variable conditions. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that activates different SubNets within a SuperNet. This creates an opportunity to exploit the inherent temporal locality of different queries that use the same SuperNet. We propose a hardware-software co-design called SUSHI that introduces a novel SubGraph Stationary optimization. SUSHI consists of a novel field-programmable gate array implementation and a software scheduler that controls which SubNets to serve and which SubGraph to cache in real time. SUSHI yields up to a 32% improvement in latency, 0.98% increase in served accuracy, and achieves up to 78.7% off-chip energy saved across several neural network architectures.

查看译文

关键词

Kernel,Training,Real-time systems,Optimization,Neural networks,System-on-chip,Software

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要