Improving Convolution via Cache Hierarchy Tiling and Reduced Packing.

Victor Ferrari,Rafael C. F. Sousa,Marcio Pereira,João P. L. de Carvalho,José Nelson Amaral,Guido Araujo

PACT（2022）

引用 0|浏览24

暂无评分

摘要

Convolution is one of the most computationally intensive machine learning model operations, usually solved by the known Im2Col + BLAS method. This work proposes a novel convolution-algorithm to improve upon Im2Col + BLAS by introducing (a) CSA: a convolution specific 3D cache-blocking analysis that focuses on tile reuse over the cache hierarchy, (b) CSO: a macro-kernel that follows CSA to compute the convolution by tiling it, (c) a specialized microkernel that seeks to achieve peak hardware performance, and (d) packing routines for the input tensor and filters to bridge the gap between tiling and micro-kernel. Our approach speeds up end-to-end machine learning model inference by up to 26% and 21% for x86 and POWER10 architectures, respectively.

查看译文

关键词

Convolution, Data Transfer, Packing, Cache Blocking

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要