General Matrix Multiplication (GEMM) Evaluation on Cyclone-V SoC FPGA Using OpenCL

2023 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET)(2023)

Cited 0|Views3
No score
Abstract
Trends in AI implementations, such as machine learning and deep learning, have been widely used in various studies. At the heart of AI computations such as speech recognition, image recognition, and processing computer graphics is general matrix multiplication, abbreviated as GEMM. AI implementation is currently attracting programmers to implement it on an FPGA. On the other hand, the latest FPGA is integrated with a Hard Processor System (HPS) to increase the productivity of system development. Here, we performed the matrix multiplication using a Cyclone V SoC FPGA integrated with ARM processors. To reduce programming complexity and speed up the process, the kernel for the FPGA was built using the OpenCL programming language. In the experiment, we conducted the matrix multiplication by employing global and local memory. We also proposed a method for estimating the peak performance based on the compilation report. According to the experimental results, the actual performance was approximately identical to the estimation. The peak performance for matrix multiplication using local memory access was 17.86 GFlops and utilized 14.2 watts of power.
More
Translated text
Key words
FPGA,SoC,OpenCL,matrix multiplication,GEMM
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined