Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving.

HPCA(2023)

引用 0|浏览6
暂无评分
摘要
The proportion of machine learning (ML) inference in modern cloud workloads is rapidly increasing, and graphic processing units (GPUs) are the most preferred computational accelerators for it. The massively parallel computing capability of GPUs is well-suited to the inference workloads but consumes more power than conventional CPUs. Therefore, GPU servers contribute significantly to the total power consumption of a data center. However, despite their heavy power consumption, GPU power management in cloud-scale has not yet been actively researched. In this paper, we reveal three findings about energy efficiency of ML inference clusters in the cloud. ❶ GPUs of different architectures have comparative advantages in energy efficiency to each other for a set of ML models. ❷ The energy efficiency of a GPU set may significantly vary depending on the number of active GPUs and their clock frequencies even when producing the same level of throughput. ❸ The service level objective(SLO)-blind dynamic voltage and frequency scaling (DVFS) driver of commercial GPUs maintain an immoderately high clock frequency. Based on these implications, we propose a hierarchical GPU resource management approach for cloud-scale inference services. The proposed approach consists of energy-aware cluster allocation, intra-cluster node scaling, intra-node GPU scaling and GPU clock scaling schemes considering the inference service architecture hierarchy. We evaluated our approach with its prototype implementation and cloud-scale simulation. The evaluation with real-world traces showed that the proposed schemes can save up to 28.3% of the cloud-scale energy consumption when serving five ML models with 105 servers having three different kinds of GPUs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要