Many Models at the Edge: Scaling Deep Inference via Model-Level Caching

2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS)(2021)

引用 1|浏览10
暂无评分
摘要
Deep learning (DL) models are rapidly expanding in popularity in large part due to rapid innovations in model accuracy, as well as companies' enthusiasm in integrating deep learning into the existing application logic. This trend will inevitably lead to a deployment scenario, akin to the content delivery network for web objects, where many deep learning models-each with different popularity-run on a shared edge with limited resources. In this paper, we set out to answer the key question of how to manage many deep learning models at the edge effectively. Via an empirical study based on profiling more than twenty deep learning models and extrapolating from an open-source Microsoft Azure workload trace, we pinpoint a promising avenue of leveraging cheaper CPUs, rather than commonly promoted accelerators, for edge-based deep inference serving. Based on our empirical insights, we formulate the DL model management problem as a classical caching problem, which we refer to as model-level caching. As an initial step towards realizing model-level caching, we propose a simple cache eviction policy, called CremeBrulee, by adapting BeladyMIN to explicitly consider DL model-specific factors when calculating each in-cache object's utility. Using a small-scale testbed, we demonstrate that CremeBrulee can achieve a 50% reduction in memory while keeping load latency below 92% of execution latency and less than 36% of the penalty of using a random approach to model eviction. Further, when scaling to more models and requests in a simulation, we demonstrate that CremeBrulee can keep the model load delay lower than other eviction policies that only consider workload characteristics by up to 16.6%. Relevant research artifacts are available at https://github.com/cake-lab/CremeBrulee
更多
查看译文
关键词
deep-learning,resource-management,cloud-computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要