Pipelining of a Mobile SoC and an External NPU for Accelerating CNN Inference

IEEE Embedded Systems Letters(2023)

引用 0|浏览0
暂无评分
摘要
Convolutional Neural Networks (CNN) algorithms are increasingly being deployed on edge devices with the co-growth of hardware and software. Deploying CNNs on resource-constrained devices often requires optimization of CPUs and GPUs. While a dedicated hardware such as a neural processing unit(NPU) has been successfully introduced, cooperative methods between CPU, GPU and NPU are still immature. In this paper, we propose two approaches to optimize the integration of a mobile system-on-chip(SoC) with an external neural processing unit(eNPU) to achieve harmonious pipelining and enhance inference speed and throughput. The first approach involves a BLAS library search scheme to allocate optimal libraries per layer on the host side, while the second approach optimizes performance by searching for model slice points. We utilize CPU-based NNPACK, OpenBLAS, and GPU-based CLBlast as computing libraries that are automatically allocated. The entire neural network is optimally split into two segments based on the characteristics of the neural network layers and hardware performance. We evaluated our algorithm on various mobile devices, including the Hikey-970, Hikey-960, and Firefly-rk3399. Through experiments, we show that the proposed pipeline inference method reduces latency by 10% and increases throughput by more than 17% compared to parallel execution on an eNPU and SoC.
更多
查看译文
关键词
Inference Pipelining,NPU pipelining,Convolutional Neural Network,Model Slicing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要