Chrome Extension
WeChat Mini Program
Use on ChatGLM

Mobile Transformer Accelerator Exploiting Various Line Sparsity and Tile-Based Dynamic Quantization

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS(2024)

Cited 0|Views15
No score
Abstract
Transformer models are difficult to employ in mobile devices due to their memory- and computation-intensive properties. Accordingly, there is ongoing research on various methods for compressing transformer models, such as pruning and quantization. However, general computing platforms, such as central processing units (CPUs) and graphics processing units (GPUs), are not energy-efficient to accelerate the pruned model because the unstructured sparsity they exhibit causes degradation of parallelism. In this article, we propose a low-power accelerator for transformers that can handle various levels of structured sparsity induced by line pruning with different granularity. Our approach accelerates pruned transformers in a head-wise and line-wise manner. We present a head reorganization and shuffling method that supports head-wise skip operations and resolves the load imbalance problem among processing engines (PEs) caused by the varying number of operations in each head. Furthermore, we implemented a sparse quantized general matrix-to-matrix multiplication (SQ-GEMM) module that supports line-wise skipping and on-the-fly tile-based dynamic quantization of activations. As a result, compared to mobile GPU and CPU, the proposed accelerator improved the energy efficiency by 2.9x and 12.3x for the detection transformer (DETR), and 3.0x and 12.4x for the vision transformer (ViT) models, respectively. In addition, our proposed mobile accelerator achieved the highest-energy efficiency among the current state-of-the-art FPGA-based transformer accelerators.
More
Translated text
Key words
Transformers,Head,Computational modeling,Quantization (signal),Graphics processing units,Energy efficiency,Computer architecture,Transformer accelerator,transformer optimization,vision transformer (ViT)
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined