An Application Specific Processor Architecture with 3D Integration for Recurrent Neural Networks

20th International Symposium on Quality Electronic Design (ISQED)（2019）

引用 4|浏览3

暂无评分

摘要

Deep learning using recurrent neural networks has broadened the horizon of artificial intelligence. It can process a massive amount of multimodal natural data (video, audio) and learn useful join representations in various applications. However, implementation of recurrent neural networks in hardware and learn representations requires high throughput and memory bandwidth of that hardware platform. This work offers a 3D hardware architecture with application-specific instruction set processor, 3D-stacked memory, and sized on-chip memory for training and inference of recurrent neural networks. It also implemented a set of short instructions after analyzing different complex, time-consuming, special operations into high-level function blocks. The accelerator also performed state-of-the-art mixed precision training using custom instructions. A high-level programming environment is developed to generate Very Long Instruction Word (VLIW) instructions for this accelerator and processed a popular and successful variant of the recurrent neural network. At 28nm, this work achieved 8.5 x processing speedup, 47.5 x energy efficiency per sequence, and 2.71 x reduction in silicon area against a GPU.

查看译文

关键词

deep learning,accelerator,LSTM,ASIP

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要