Cache Antagonists Identification: A Practice from Alibaba Colocation Datacenter

2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)(2022)

引用 0|浏览5
暂无评分
摘要
Colocating latency-critical (LC) jobs and best-effort (BE) jobs on a host effectively improve resource efficiency in modern datacenters. But it increases resource contention between jobs, which seriously affects job performance. In Alibaba's real- world LC- BE colocation datacenters, we observed that cache is one of the most contended resources in the CPU. When cache contention occurs, identifying the antagonists that caused cache resource contention is the first step to mitigate cache contention, called cache antagonists identification (CAl). However, it is chal-lenging to identify cache antagonists because cache contention is difficult to observe and quantify. In this paper, we first propose cache usage graph (CUG) to finely characterize cache usage of jobs in the multiple CPU microarchitectural hierarchies and locations, and we provide a monitoring tool to collect “per-container-per-logic CPU” Ll/2/3 cache misses and build CUG. Then we propose a CUG-based CAl approach, $\mu$ Tactic. $\mu$ Tactic leverages machine learning models to quantify the cache contention on every cache hierarchy, then reasons out the cache antagonists with CUG. Experiments in production datacenters show that $\mu$ Tactic has a high precision (85+%) and low cost (32 ms), which are better than state-of-the-art approaches.
更多
查看译文
关键词
microarchitecture,cache contention,interfer-ence,resource management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要