Towards Accurate Latency Prediction of DNN Layers Inference on Diverse Computing Platforms.

DASC/PiCom/CBDCom/CyberSciTech(2022)

引用 0|浏览10
暂无评分
摘要
Deep Neural Networks (DNNs) have been intensively deployed in a variety of mobile and edge applications. These applications are constrained by latency constraints. Therefore, collaborative inference has recently become a research hotspot, in which the basis of collaborative inference is to accurately predicate the execution latency of each layer in a DDN. Unfortunately, it is laborious and difficult to predict the inference latency of DNN layers. The prediction-based approaches seem to address this problem, but it still faces challenges in the current solutions, such as (1) the existing prediction strategies do not consider the change of a load of platforms when estimating the computation latency; (2) most of existing prediction approaches consider only CNN networks and do not consider the RNN model. To this end, in this paper, we propose a DNN layers latency prediction framework to accurately estimate the inference of each layer in a DNN in the end-edge-cloud computing environment. Specifically, first, a latency prediction model is built to estimate the layer-wise execution latency of CNN models and RNN models on different heterogeneous computing platforms, which use neural networks to learn non-linear features related to inference latency. Second, a more comprehensive configuration is proposed to improve the predictive performance, which includes static (i.e., FLOPs, memory features, size of the parameter) and dynamic configurations (i.e., CPU, memory, and GPU). Finally, we conduct extensive experiments and the experimental results show that our latency prediction models can improve the prediction accuracy by about 27.2% on average compared with the four baseline approaches.
更多
查看译文
关键词
accurate latency prediction,dnn layers inference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要