Experimental Analysis of Matrix Multiplication Functional Units

2019 IEEE 26th Symposium on Computer Arithmetic (ARITH)（2019）

引用 6|浏览0

暂无评分

摘要

The rapid growth of AI has led to the introduction of several new hardware designs to accelerate the matrix multiplication operation at the heart of AI applications. Examples include NVIDIA's Tensor Core*, Google's TPU*, and Intel's Neural Compute Stick*. However, the IEEE 754 standard gives significant implementation-specific flexibility in the definition of the matrix multiplication operation and the precision and compatibility of these new accelerators is not well documented. This paper describes a method exploiting the rounding modes and other features of the IEEE 754 standard in order to gain deeper insight into the design and functionality of matrix multiplication units. We apply this method to the NVIDIA V100 GPU Tensor Core* units and report our findings on the design properties and micro-architecture.

查看译文

关键词

Machine learning, deep learning, tensor, half precision, single precision, Volta, matrix multiplication

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要