Towards A Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms
CoRR(2023)
摘要
With the rapid development of Deep Learning, more and more applications on
the cloud and edge tend to utilize large DNN (Deep Neural Network) models for
improved task execution efficiency as well as decision-making quality. Due to
memory constraints, models are commonly optimized using compression, pruning,
and partitioning algorithms to become deployable onto resource-constrained
devices. As the conditions in the computational platform change dynamically,
the deployed optimization algorithms should accordingly adapt their solutions.
To perform frequent evaluations of these solutions in a timely fashion, RMs
(Regression Models) are commonly trained to predict the relevant solution
quality metrics, such as the resulted DNN module inference latency, which is
the focus of this paper. Existing prediction frameworks specify different RM
training workflows, but none of them allow flexible configurations of the input
parameters (e.g., batch size, device utilization rate) and of the selected RMs
for different modules. In this paper, a deep learning module inference latency
prediction framework is proposed, which i) hosts a set of customizable input
parameters to train multiple different RMs per DNN module (e.g., convolutional
layer) with self-generated datasets, and ii) automatically selects a set of
trained RMs leading to the highest possible overall prediction accuracy, while
keeping the prediction time / space consumption as low as possible.
Furthermore, a new RM, namely MEDN (Multi-task Encoder-Decoder Network), is
proposed as an alternative solution. Comprehensive experiment results show that
MEDN is fast and lightweight, and capable of achieving the highest overall
prediction accuracy and R-squared value. The Time/Space-efficient
Auto-selection algorithm also manages to improve the overall accuracy by 2.5%
and R-squared by 0.39%, compared to the MEDN single-selection scheme.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要