Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

2020 30th International Conference on Field-Programmable Logic and Applications (FPL)(2020)

引用 6|浏览15
暂无评分
摘要
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance of up to 2.5x and faster convergence of up to 8.1x.
更多
查看译文
关键词
Neural Networks, Machine Learning, Autotuning, FPGA, Transprecision Computing, Tensor Accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要