Improving Convolution via Cache Hierarchy Tiling and Reduced Packing.

PACT(2022)

引用 0|浏览24
暂无评分
摘要
Convolution is one of the most computationally intensive machine learning model operations, usually solved by the known Im2Col + BLAS method. This work proposes a novel convolution-algorithm to improve upon Im2Col + BLAS by introducing (a) CSA: a convolution specific 3D cache-blocking analysis that focuses on tile reuse over the cache hierarchy, (b) CSO: a macro-kernel that follows CSA to compute the convolution by tiling it, (c) a specialized microkernel that seeks to achieve peak hardware performance, and (d) packing routines for the input tensor and filters to bridge the gap between tiling and micro-kernel. Our approach speeds up end-to-end machine learning model inference by up to 26% and 21% for x86 and POWER10 architectures, respectively.
更多
查看译文
关键词
Convolution, Data Transfer, Packing, Cache Blocking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要