An Efficient Im2row-Based Fast Convolution Algorithm For Arm Cortex-M Mcus

IEEE ACCESS(2021)

引用 1|浏览6
暂无评分
摘要
With the rise of IoT and edge computing, deploying neural networks (NNs) on low-power edge computing devices is drawing more and more attention. In NNs, convolutional layers take up the majority of the computing cycles, especially when NNs are implemented on ARM processors. Therefore, it is necessary to optimize the convolutional implementation on ARM Cortex-M MCUs. This paper proposes an efficient im2row-based fast convolution algorithm with two innovations. First, a novel im2row method for reusing the data of adjacent convolutional windows is presented. This method utilizes a reusable im2row buffer for data reuse, significantly reducing the amount of data copied during im2row and improving efficiency. Second, in algorithm implementation, a q7_t to q15_t data type extension technique that avoids data reordering is employed. This technique eliminates data reordering instructions, thus reducing the runtime of the algorithm. We evaluate our algorithm in separate convolutional layers and NNs. The results for convolutional layers show that, compared to baseline, the proposed algorithm speeds up the convolutional layer by an average of 1.42x, and the maximum speedup is up to 2.9x. Experiments on different NNs demonstrate that our algorithm can speed up the overall NN by up to 2.15x.
更多
查看译文
关键词
im2row (or im2col), fast convolution algorithm, edge AI, embedded software, ARM Cortex-M
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要