AdaInf: Data Drift Adaptive Scheduling for Accurate and SLO-guaranteed Multiple-Model Inference Serving at Edge Servers

PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023(2023)

引用 1|浏览3
暂无评分
摘要
Various audio and video applications rely on multiple deep neural network (DNN) models deployed on edge servers to conduct inference with ms-level latency service-level-objectives (SLOs). To avoid accuracy decreases caused by data drift, continual retraining is necessary. However, this poses a challenge for GPU resource allocation to satisfy the tight SLOs while maintaining high accuracy in this scenario. There has been no research devoted to tackling this issue. In this paper, we conducted trace-based experimental analysis in this particular scenario, which shows that different models have varying degrees of impact from data drift, incremental retraining (proposed by us that retrains certain samples before inference) and early-exit model structures can help increase accuracy, and the interdependencies among tasks may lead to significant CPU-GPU memory communications. Leveraging these unique observations, we propose a data drift Adaptive scheduler for accurate and SLO-guaranteed Inference serving at edge servers (AdaInf). AdaInf uses incremental retraining and allocates GPU amount among applications based on their SLOs. For each application, it splits GPU time between retraining and inference to satisfy its SLO, and then allocates GPU time among retraining tasks based on their impact degrees. In addition, AdaInf proposes strategies that leverage the job features in this scenario to reduce the impact of CPU-GPU memory communications on latency. Our real trace-driven experimental evaluation shows that AdaInf can increase accuracy by up to 21% and reduce SLO violations by up to 54% compared to existing methods. Achieving similar accuracy as AdaInf requires 4x more GPU resources on the edge server for the existing method.
更多
查看译文
关键词
Edge server,deep learning,data drift,retraining,inference serving
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要