谷歌浏览器插件
订阅小程序
在清言上使用

FET-OPU: A Flexible and Efficient FPGA-based Overlay Processor for Transformer Networks

2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD(2023)

引用 0|浏览14
暂无评分
摘要
There are already some works on accelerating transformer networks with field-programmable gate array (FPGA). However, many accelerators focus only on attention computation or suffer from fixed data streams without flexibility. Moreover, their hardware performance is limited without schedule optimization and full use of hardware resources. In this article, we propose a flexible and efficient FPGA-based overlay processor, named FET-OPU. Specifically, we design an overlay architecture for general accelerations of transformer networks. We propose a unique matrix multiplication unit (MMU), which consists of a processing element (PE) array based on modified DSP-packing technology and a FIFO array for data caching and rearrangement. An efficient non-linear function unit (NFU) is also introduced, which can calculate arbitrary single input non-linear functions. We also customize an instruction set for our overlay architecture, dynamically controlling data flows by instructions generated on the software side. In addition, we introduce a two-level compiler and optimize the parallelism and memory allocation schedule. Experimental results show that our FET-OPU achieves 7.33-21.27x speedup and 231x less energy consumption compared with CPU, and 1.56-4.08x latency reduction with 5.85-66.36x less energy consumption compared with GPU. Furthermore, we observe 1.56-8.21x better latency and 5.28-6.24x less energy consumption compared with previously customized FPGA/ASIC accelerators and can be 2.05x faster than NPE with 5.55x less energy consumption.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要