Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION(2024)

引用 0|浏览13
暂无评分
摘要
Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this article, we propose an effective quantizedWinograd convolution, named LoWino, which employs an in-side quantization method in theWinograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate LoWino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84x and 1.91x operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level.
更多
查看译文
关键词
Deep learning,winograd convolution,low-precision computation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要