An Energy-Efficient Heterogeneous Fourier Transform-Based Transformer Accelerator with Frequency-Wise Dynamic Bit-Precision.

2023 IEEE Asian Solid-State Circuits Conference (A-SSCC)(2023)

引用 0|浏览0
暂无评分
摘要
Recently, transformer models [1] have achieved human-level accuracy and become the mainstream algorithm for diverse Natural Language Processing (NLP) tasks with the self-attention mechanism. The transformer models comprise a series of repeated self-attention layers that find the relationships between tokens and mixing information (TokenMix) and the subsequent feed-forward network (FFN) for capturing complex patterns. The self-attention layer calculates the relevance weights for each input token and employs these weights to a weighted-sum operation for calculating the output token. Therefore, the model can capture contextual relationships by mixing the information in multiple input tokens into output tokens. The Fourier transform-based TokenMix (FT-TokenMix) algorithm [2] replaces the computationally-heavy self-attention algorithm with a light-weight Fourier transform while achieving nearly the same performance. Since Fourier transform also finds the correlation between a different data point and mixes the information of the input token, it can capture the contextual relationships. Also, because the computation complexity of the Fourier transform is much lower than the self-attention layer with the Fast Fourier transform (FFT) algorithm, 99.6% of token mixing computation can be reduced.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要