Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE(2024)

引用 0|浏览6
暂无评分
摘要
We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS, and OpenBLAS, to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). In addition, we fully automatize the generation process by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for GEMM. This is in contrast with the convention in high-performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. In global, the combination of our TVM-generated blocked algorithms and micro-kernels for GEMM (1) improves portability, maintainability, and, globally, streamlines the software life cycle; (2) provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and (3) features a small memory footprint.
更多
查看译文
关键词
Portability and maintainability,software lifecycle,matrix multiplication,BLIS framework,Apache TVM,blocking,SIMD vectorization,high performance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要