Performance Evaluation of Acceleration of Convolutional Layers on OpenEdgeCGRA
CF '24 Companion Proceedings of the 21st ACM International Conference on Computing Frontiers Workshops and Special Sessions(2024)
Abstract
Recently, efficiently deploying deep learning solutions on the edge hasreceived increasing attention. New platforms are emerging to support theincreasing demand for flexibility and high performance. In this work, weexplore the efficient mapping of convolutional layers on an open-hardware,low-power Coarse-Grain Reconfigurable Array (CGRA), namely OpenEdgeCGRA. Weexplore both direct implementations of convolution and solutions that transformit into a matrix multiplication through an Im2col transformation, andexperiment with various tensor parallelism axes. We show that for this hardwaretarget, direct convolution, coupled with weight parallelism reaches the bestlatency and energy efficiency, outperforming a CPU implementation by 3.4x and9.9x in terms of energy and latency, respectively.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined