Chrome Extension
WeChat Mini Program
Use on ChatGLM

Optimizing Job Offloading Schedule for Collaborative DNN Inference

IEEE TRANSACTIONS ON MOBILE COMPUTING(2024)

Cited 1|Views7
No score
Abstract
Deep Neural Networks (DNNs) have been widely deployed in mobile applications. DNN inference latency is a critical metric to measure the service quality of those applications. Collaborative inference is a promising approach for latency optimization, where partial inference workloads are offloaded from mobile devices to cloud servers. Model partition problems for collaborative inference have been well studied. However, little attention has been paid to optimizing offloading pipeline for multiple DNN inference jobs. In practice, mobile devices usually need to process multiple DNN inference jobs simultaneously. We propose to jointly optimize the DNN partitioning and pipeline scheduling for multiple inference jobs. We theoretically analyze the optimal scheduling conditions for homogeneous chain-structure DNNs. Based on the analysis, we proposed near-optimal partitioning and scheduling methods for chain-structure DNNs. We also extend those methods for general-structure DNNs. In addition, we extend our problem scenario to handle heterogeneous DNN inference jobs. A layer-level scheduling algorithm is proposed. Theoretical analyses show that our proposed method is optimal when computation graphs are tree-structure. Our joint optimization methods are evaluated in a real-world testbed. Experiment results show that our methods can significantly reduce the overall inference latency of multiple inference jobs compared to partition-only or schedule-only approaches.
More
Translated text
Key words
Pipelines,Processor scheduling,Mobile handsets,Task analysis,Solid modeling,Computational modeling,Cloud computing,Collaborative DNN inference,job offloading,makespan minimization,mobile cloud computing,pipeline scheduling
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined