Auto WS: Automate Weights Streaming in Layer-Wise Pipelined DNN Accelerators.

Design, Automation, and Test in Europe(2024)

引用 0|浏览0
With the great success of Deep Neural Networks (DNN), the design of efficient hardware accelerators has triggered wide interest in the research community. Existing research explores two architectural strategies: sequential layer execution and layer-wise pipelining. While the former supports a wider range of models, the latter is favoured for its enhanced customization and efficiency. A challenge for the layer-wise pipelining architecture is its substantial demand for the on-chip memory for weights storage, impeding the deployment of large-scale networks on resource-constrained devices. This paper introduces AutoWs,a pioneering memory management methodology that exploits both on-chip and off-chip memory to optimize weight storage within a layer-wise pipelining architecture, taking advantage of its static schedule. Through a comprehensive investigation on both the hardware design and the Design Space Exploration, our methodology is fully automated and enables the deployment of large-scale DNN models on resource-constrained devices, which was not possible in existing works that target layer-wise pipelining architectures. AutoWS is open-source:
Deep Neural Network,Deep Neural Network Model,Network Deployment,Sequential Execution,Pipelining,On-chip Memory,Resource-constrained Devices,Design Space Exploration,Off-chip Memory,Step Size,Parallelization,Object Detection,Convolution Operation,Vanilla,Regional Dynamics,Design Points,Resource Model,Memory Structure,Bandwidth Allocation,Bit-width,Static Regions,Static Storage
AI 理解论文
Chat Paper