DeepScaling: microservices autoscaling for stable CPU utilization in large scale cloud systems

International Conference on Management of Data(2022)

引用 8|浏览68
暂无评分
摘要
ABSTRACTCloud service providers conservatively provision excessive resources to ensure service level objectives (SLOs) are met. They often set lower CPU utilization targets to ensure service quality is not degraded, even when the workload varies significantly. Not only does this potentially waste resources, but it can also consume excessive power in large-scale cloud deployments. This paper aims to minimize resource costs while ensuring SLO requirements are met in a dynamically varying, large-scale production microservice environment. We propose DeepScaling, which introduces three innovative components to adaptively refine the target CPU utilization to a level that is maintained at a stable value to meet SLO constraints while using minimum resources. First, DeepScaling forecasts the workload for each service using a Spatio-temporal Graph Neural Network. Second, DeepScaling estimates the CPU utilization by mapping the workload intensity to an estimated CPU utilization with a Deep Neural Network, while taking into account multiple factors in the cloud environment (e.g., periodic tasks and traffic). Third, DeepScaling generates an autoscaling policy for each service based on an improved Deep Q Network (DQN). The adaptive autoscaling policy updates the target CPU utilization to be a maximum, stable value, while ensuring SLOs is not violated. We compare DeepScaling with state-of-the-art autoscaling approaches in the large-scale production cloud environment of the Ant Group. It shows that DeepScaling outperforms other approaches both in terms of maintaining stable service performance, and saving resources, by a significant margin. The deployment of DeepScaling in Ant Group's real production environment with 135 microservices saves the provisioning of over 30,000 CPU cores per day, on average.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要