A Low-Cost Floating-Point Dot-Product-Dual-Accumulate Architecture for HPC-Enabled AI

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS(2024)

引用 1|浏览22
暂无评分
摘要
The dot-product Sigma( N)(i=1) A(i) x B-i is one of the most frequently used operations for a wide variety of high-performance computing (HPC) and artificial intelligence (AI) applications. However, for large-scale algorithms, such as acrshort GEMM and acrshort FFT, independent additions are necessary to accumulate the results of length-limited dot-product in order to form the final result, thus increasing latency and overhead. Hence, we proposed a dot-product-dual-accumulate (DPDAC) architecture capable of performing (Sigma( N=1,2,4 )(i=1)A(i) x B-i + Sigma C-M=1,2 (j=1)j) on a wide range of formats. The proposed architecture supports both single-path and dual-path execution. The single path is designed for performing acrshort DP acrshort FMA or DPDAC of lower formats, while dual-path supports parallel operations for single-precision (SP) addition and 2-term SP or acrshort TF32 dot-product or 4-term acrshort HP or BF16 dot-product. Moreover, numerical precision conversion is also supported by the proposed architecture, allowing for the conversion of numbers to higher or lower formats. The proposed DPDAC has been demonstrated to significantly reduce the overhead in comparison to discrete designs that utilize multiple single-mode acrshort FP units to achieve the same functionalities. Furthermore, when compared to the state-of-the-art multiple-precision designs, the proposed architecture has been shown to support a wide range of formats and a greater variety of operations with lower costs.
更多
查看译文
关键词
Dot-product-dual-accumulate (DPDAC),fused multiply-add,high-performance computing (HPC)-enabled artificial intelligence (AI),mixed-precision,numerical precision conversion,transprecision computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要