Compiling for the IBM Matrix Engine for Enterprise Workloads

IEEE Micro(2022)

引用 1|浏览16
暂无评分
摘要
The matrix-multiply assist (MMA) facility is the latest addition to IBM’s power instruction set architecture and first shipped in the recently introduced POWER10 processor. MMA is designed to accelerate matrix–matrix operations, such as matrix multiplication and convolution, using instructions that compute the outer product of vector-register operands. Outer product computations have been used for decades in linear algebra libraries to deliver high-performance implementations of matrix operations. Such libraries use conventional single-instruction–multiple-data (SIMD) instructions to emulate outer product operations. MMA in POWER10 is the first hardware with direct support for outer product operations released in the market. MMA operates with the widest diversity of data types compared to any accelerator design currently announced. Unleashing the high-performance enabled by MMA requires careful code generation. Two key considerations for optimal MMA code performance are 1) the choice of accumulation layout when maximizing the using the accumulators and 2) the selection of matrix access order. This article shows that over 92% of peak performance in POWER10 with MMA can be achieved when the code generation makes the right choices.
更多
查看译文
关键词
ibm matrix engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要