A Multi-level Parallel Integer/Floating-Point Arithmetic Architecture for Deep Learning Instructions.

Euro-Par(2023)

引用 0|浏览19
暂无评分
摘要
The extensive instruction-set for deep learning (DL) significantly enhances the performance of general-purpose architectures by exploiting data-level parallelism. However, it is challenging to design arithmetic units capable of performing parallel operations on a wide range of formats to perform DL instructions (DLIs) efficiently. This paper presents a multi-level parallel arithmetic architecture capable of supporting intra- and inter-operation parallelism for integer and a wide range of FP formats. For intra-operation parallelism, the proposed architecture supports multi-term dot-product for integer, half-precision, and BrainFloat16 formats using mixed-precision methods. For inter-operation parallelism, a dual-path execution is enabled to perform integer dot-product and single-precision (SP) addition in parallel. Moreover, the architecture supports the commonly used fused multiply-add (FMA) operations in general-purpose architectures. The proposed architecture strictly adheres to the computing requirements of DLIs and can efficiently implement them. When using benchmarked DNN inference applications where both integer and FP formats are needed, the proposed architecture can significantly improve performance by up to 15.7% compared to a single-path implementation. Furthermore, compared with state-of-the-art designs, the proposed architecture achieves higher energy efficiency and works more efficiently in implementing DLIs.
更多
查看译文
关键词
integer/floating-point,parallel,multi-level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要